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Abstract — the objective of this work is to establish an 
information system which would facilitate decision making for 
the exploitation of a model consisting of the main university 
stakeholders (teachers, students and administrators). This system 
is based on the relationship between actors (players) on the one 
hand and their activities and their aggregations in a graduate 
level on the other. It aims to make available to managers of the 
university a set of dashboards that can improve the quality of 
education. 

Meanwhile, ontology is now at the heart of the work of 
engineering knowledge and demonstrated in several areas. An 
ontology dedicated to the design of SD is a model of organization 
of knowledge for a given domain. It represents the 
multidimensional concepts of a domain and their multi- 
dimensional relationship (LDR) and semantic. After recalling our 
ontology construction method decision, we will detail in this 
paper the optimization phase of RMD after the enrichment of the 
ontology. This phase is governed by rules that we define 
optimization. 

Keywords-Data warehouse conceptual modeling, ontology 
integration, information systems decisions. 



usable by treatments useful for decision support, information 
extracted from these sources that are potentially relevant to a 
particular category of decision makers in academia. 

A. PRINCIPLE OF A DATA WAREHOUSE 

First, confirm that you have the correct template for your 
paper size. This template has been tailored for output on the 
US-letter paper size. If you are using A4-sized paper, please 
close this file and download the file for "MS W A4 format". 

A warehouse is defined as a collection of integrated data, 
subject-oriented, non- volatile historized, summarized and 
available for query and analysis. The data warehouse stores 
data necessary for decision making and is fed and updated via 
data extractions on the basis of production that are considered 
in the decision-making chain as data sources. 



B. Hypotheses 



I. 



Introduction 



In this paper we are interested in modeling of actors and 
academic resources of a systemic point of view to change the 
information system of academic information system decision- 
making. 

The model we use is a concept ontology decisions it is 
considered a way of waiting for assistance that connects 
players to the expression of academic activities related to its 
aggregation. And a resource to ontology is motivated by its 
ability to resolve semantic and syntactic ambiguities. Indeed, it 
is a repository that contains a set of concepts and their 
relationships that characterize a given area. 

After recalling in our first issue, we begin with the creation of 
data warehouses that is an answer to the problem of integrating 
a large amount of data varied on a scope, and physically stored 
in different data sources. The data warehouse contains a form 



The hypothesis of this study is to show that if we start by 
modeling the university actors (teachers, administrators and 
students) above, taking into account the principle of decision- 
making and ontology specifications and expectations of each of 
them, we resulting in an improvement in satisfaction of the 
actors. This assumption applies in a Moroccan university. 

II. Issues 

We will start by modeling [4] actors up taking into account the 
requirements and expectations of each of them, namely: 

- The student who wishes to have quality training and be 
endowed with skills facilitating integration into life. 

- The teacher who has the task of producing and transferring 
knowledge. 

- The administration staff whose task is on the one hand to 
facilitate the work of the teacher serving students, 
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disseminating and sharing information, and on the other hand 
to meet the needs of clients outside university. 
Given this situation, it is a must that the task of the student [1], 
[2] and that of the teacher and the administrators be correlated. 
In fact, we are faced with a situation where we are aiming at 
the satisfaction of the customer / user with a specific university 
that concept actor / user; also as a business. Indeed, a 
company's approach to governance is 'profit', while that of a 
university is about positioning and achieving visibility of the 
organization. The company seeks a position of performance at 
its capital and the university aims to achieve quality and a high 
ranking both nationally and internationally. The company seeks 
customer satisfaction, whereas the university seeks to satisfy its 
stakeholders. Customer satisfaction in business is formalized in 
terms of costs but satisfaction in university occurs by meeting 
their needs. 

A. Modeling the actors 

Previously (in [1] [2]), we showed that applications for the 
actor level are based on information gathered from the 
databases (DB Apg, DB EN, DB AD, DB AC, DB AG, DB 
DOC). The design of a Decision Support Information System 
[3] requires a special approach to design and modeling 
complex [4]. We adopted a model to meet specific needs such 
as factor analysis [5] which has a policy to facilitate 
understanding and interpretation of a large set of 
multidimensional data. This analysis shows graphically the 
similarities between the data and quantifies the degree of 
correlation between several factors. 

The model we get includes all actors involved in the university 
system. It is follows: 
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designer, throughout this construction is necessary or required. 
Indeed, they allow the approval of the results obtained and the 
resolution of some ambigui'tes.et test the reliability of our 
proposed model: 
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With: S: Source activity for all players. C: Category. A: all 
aggregations. 



III. 



Basic Concepts for Ontology decision 



The term 'ontology' has multiple definitions (in [8], [9]). 
One of the most simple and popular is that of Gruber (Gruber, 
1993) An ontology is an explicit formal specification of a 
shared conceptualization. The terms formal and ontology 
defines a common vocabulary for users who need to share 
information in a particular field. They allow (Pierre, 2005): 

- Share a common understanding of the structure of 
information. 

Allow the reuse of knowledge in a domain. 

- Explain what is considered implicit in a domain. 

- Distinguish between knowledge about a field of operational 
knowledge. 

Our approach of building ontology is progressive and 
iterative decision-making (in [6], [7]). Intervention of the 




After the consolidation of the formula 1, we obtain: 
The portfolio of the source (S) defines all the activities to be 
performed during one cycle by each university players. 
Category (C) defines the three actors of the university: Student, 
Teacher, and Administrator. Aggregation (A) defines the needs 
of each player for a graduate level. 
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Beginning of the University Cycle: 
The portfolio administrative actors is the first actor at 

a time t: 

Administrative actor (PA)= {Ci (l<i<3) ; Aj(l<j<6)} 
The actor Teacher portfolio is the second player at 

time t+At: 

Actor Teacher PE) = {Ci (3<i<8) ; Aj (6<j<H)} 
The Student Portfolio actor is the No. 3 player at a 

time t+At+1 Actor Student (PT)= {Si (8<i<13) ; Aj 

(H<j<16)} 

This model is then obtained: 




End of University Cycle: 
At the end of the cycle, the three actors are 
involved: 




http://sites.google.com/site/ijcsis/ 
ISSN 1947-5500 



(IJCSIS) International Journal of Computer Science and Information Security, 

Vol. 10, No. 2, 2012 



• The Teacher actor portfolio is the first speaker at the 
end of the academic cycle: 

Teacher Actor (PE)={Si(29<i<31) ;Aj (37<j<39)} 

• The Student actor Portfolio is the second place at the 
end of the academic cycle: 

Student Actor (PT)= {Si (31<i<35) ; Aj (42<j<44)} 

• The administrative actors portfolio is the last speaker 
at the end of the academic cycle: 

Administrator Actor (PA)= {Si (35<i<37) ; Aj (39<j<42)} 
The following model is then obtained: 
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Table 1 : Role, Activities, Aggregation of Actors 

Model of application is to justify the balance between all 
the activities of all actors and their aggregations at the end of a 
graduate level. 

In this context, we present, as an application of indicators 
defined by the makers of the university and programmed by 
technical information system making the institution in order to 
improve the performance of each actor. 

To better understand this approach, we are using a graphic 
to show the equilibrium relationship between each actor and 
their activities at an undergraduate level [6] and its aggregation, 
taking into account the multiple observations to develop our 
model. 

The rest of this section introduces the concepts of our model 
using the method of an ontology and decision-making, explains 
briefly the extraction phases multidimensional elements, the 
deduction of relations and standardization. 
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Figure 1 . General architecture of a system for building ontologies decision. 

A. Basic Concepts for the OD 

The ontology is a representation of decision-making 
knowledge dedicated to decision support systems. It can be 
defined as a repository multidimensional elements, namely the 
concepts and relations that connects all the actors and 
university activities aggregations during the academic cycle 

B. Development environment 

The tool that implements our method has been developed in 
a Windows XP programming language Java, JDK 1.6. 
Ontology handled by this tool is in OWL format. The access 
and manipulation of the ontology is through the Protege OWL 
API development provided by the development editor of 
Protege ontology. This tool takes into account the relational 
database that references one or more ontologies. The 
conceptual model of the warehouse obtained is modeled using 
the class diagram of UML in which there are two classes of 
done. The diagram below express the environmental s 
implementation of the tool developed during the various stages 
of implementation. 
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Figure 6.1. Implementation of our model MACA 



IV. ETAPES OF IMPLEMENTATION AND INTERFACES 

The developed tool implements our proposed model after 
modeling MACA (represents the activities of each category of 
actors in relation to their university agragations being an cycle 
university). It includes the following main features: 

1) Visualization of ontologies or domain selected from the 
warehouse which will be built, 

2) Expression of decision-making needs of the domain 
ontology and on the "local ontology 

3) Constructions and expansion of the "local ontology, 

4) Definition of the conceptual multidimensional, and 

5) Validation of the diagram obtained by the designer. 

6) Shows the main modules "implementation of our tool. and 
use the naming convention prescribed by your conference for 
the name of your paper. In this newly created file, highlight all 
of the contents and import your prepared text file. You are 
now ready to style your paper; use the scroll down window on 
the left of the MS Word Formatting toolbar 
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Figure 6.2. Implementation architecture of our model MACA 

Conclusion 

To implement this application, we went through three major 
phases. The first is the theoretical part that needs to have a 
model which is able to meet the academic context known by its 
complexity (different actors, the wealth of data, non-uniform 
data ...). 

This requires a mathematical model defining simple 
relationships between the actors, their activities and their 
aggregations. The second phase focuses on gathering data and 
designing a multidimensional database this based on the 
principle of making the ontology. The third phase is devoted to 
the implementation and construction of a scoreboard checking 
all the proposals made in the theoretical part. 

The results obtained from available data of the student actor 
at the university are encouraging. The availability of actual data 
of the other actors need is a comprehensive decision-making 
tool for the university. 
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Abstract — Taking advantage of today's hyper connected world, 
many institutions of higher education have already implemented 
online education programs, while many others are currently 
investigating online education models for use in the near future. 
As colleges and universities gradually explore the feasibility of 
including distance education programs, privacy protection and 
securing computer networks remains a primary concern for 
them. Privacy concerns when using online education models 
increase as most of the communication among students, faculty, 
and administration occurs electronically. This paper analyzes 
privacy policies in three institutions that represent different 
education models: established face-to-face, established online and 
prospective online models. The paper addresses privacy laws 
related to institutions of higher education in general and offers 
recommendations for those institutions of higher education that 
are considering expanding their face-to-face offerings to include 
online education models. 

Keywords-higher education; traditional; online; privacy ;FERPA 



I. 



Introduction 



Information technology has a conflicting impact on privacy. 
On one side, technology promotes the use of sensitive digital 
information; it digitally stores and electronically transmits 
customer and employee records. There is a potential danger 
that these records can be accessed in their storage devices or 
can be intercepted as they move across networks. On the 
other side, technology protects sensitive information through 
data encryption, authentication, and secure networks. This 
relationship between technology and privacy is best showcased 
in the area of education, especially online education. 
Traditionally, the privacy policies were originally created for 
traditional teaching models [1]. However, universities are 
beginning to experience the impact that increased use of online 
course delivery systems has on procedures and academic 
policies. 

Over the past few years, there has been a significant 
increase in online education. New models include hybrid 
courses and online-enhanced courses, where several elements 
of online education are incorporated into face-to-face classes. 
Integration of the Internet in colleges and universities has been 
determinant in supporting distance learning, accessibility to 
massive information published in the web around the world, as 
well as radical improvements in applications and 
communication among tutors and students. On the other hand, 
faculty, and other university employees have become more and 
more dependent on information technology. They use 



computers and mobile devices to remotely access the university 
networks and expand their connections and communicate with 
their colleagues through professional networks. 

The global reach of information systems at both the 
university and individual level has raised concerns over 
information security and has made universities more vulnerable 
to security threats. Documented cases of security breaches and 
privacy violations at universities and colleges include the theft 
of 173,000 social security numbers, an unsecured alumni 
database used by hackers, and lawsuits filed by alumni who 
sought class action status to represent any students, employees 
and other alumni whose privacy had been violated [2]. 
Mitchell [3] emphasizes the need to develop awareness of 
regulations and laws concerning course material usage in an 
online environment. Institutions of higher education face 
significant challenges regarding privacy and security. 

Most colleges and universities perform similar types of 
commercial activities that raise privacy and security concerns. 
For example, a typical college would process electronic 
applications, accept donations, sell university merchandise, 
textbooks, athletic tickets, or serve food. In addition, 
universities also collect sensitive information from online 
transactions, grades, student records, and health records. A 
typical university will suffer an average decline of 6 to 8% in 
the number of donations when security breaches are disclosed 
[2]. 

The purpose of this paper is to explore the privacy concerns 
and policies in higher education. Three institutions are 
considered as case studies. The first case is from a university 
located in the southeast USA (UTC). Privacy issues at UTC 
and their protection are discussed in the context of UTC's 
traditional programs (UTC-T). The second case is from a 
university in the northeast USA and its online program 
(UMUC-OL). The third case incorporates an upcoming UTC 
online program (UTC-OL) being considered to start in fall of 
2012. The hope is that the analysis provided in this paper can 
be used to address privacy concerns in existing programs at 
UTC-T, UMUC-OL, and establish privacy policies for the new 
UTC-OL. 

In the next section, the paper addresses the three 
universities' mission statements and the importance of 
addressing privacy issues in the respective institution. Next, 
privacy laws in higher education in general, and Family 
Educational Rights and Privacy Act (FERPA) in particular, are 
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discussed. Finally, several recommendations and conclusions 
are provided. 



II. Mission Statement, Technology, and Privacy 

Issues at UTC-T 

In order to illustrate the importance of privacy and the role 
of information technology to support the university' s strategic 
goals, the mission statement of UTC-T is offered below: 

The mission of the university is to provide quality 
educational programs that produce academically-prepared 
and business-world ready men and women for a 
competitive global environment. The colleges of the 
university provide high quality educational programs that 
prepare online and traditional students for managerial, 
professional, or entrepreneurial opportunities. 

This mission statement indicates the importance of 
technology to support the strategic goals of UTC-T. These 
goals include fostering multidisciplinary programs and 
innovative curriculum design and delivery, increasing global 
perspectives and providing international opportunities for 
students and faculty, and supporting research and 
professionally engaged faculty. It is implied that technology 
can also be used to strengthen financial and program 
sustainability, engage, challenge and support students, and 
build strong relationships with key stakeholders. 

The dependence on technology becomes a source of 
potential violation of privacy laws and regulations. UTC-T 
must protect its students, faculty, and other employees from 
cybercrime in general and must ensure their privacy in 
particular. In order to achieve this goal, UTC-T must comply 
with federal, state, and organizational regulations. 

III. Mission Statement, Technology, and Privacy 

Issues at UMUC-OL 

UMUC has a clear mission statement. In order to illustrate 
the importance of privacy and the role of information 
technology to support the university's strategic goals, the 
mission statement of the university which is directly related to 
UMUC-OL is summarized below: 

UMUC provides programs for part-time, adult students at 
off-campus sites on an as-needed basis. Its brokering 
functions include assessing needs, monitoring the scope of 
off-campus offerings, and coordinating System resources 
to address off-campus needs. UMUC conducts 
postsecondary degree and non-degree programs 
throughout the nation and the world. 

In order to reach large numbers of students throughout the 
world, UMUC-OL offers several online services: course and 
program offerings, online library resources, online registration 
services, online financial transactions, and online advising. 
UMUC-OL makes every effort to protect the privacy of its 
students, faculty, and staff. Pertinent to privacy concerns is 
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the requirement that the university must release specific 
information to law enforcement agencies according to Foreign 
Intelligence Surveillance Act, 50 U.S.C. 1861, as amended by 
the USA PATRIOT Act. 



IV. Mission Statement, Technology, and Privacy 
Issues at UTC-OL 

UTC-OL is an integral component of UTC, and, as a 
result, UTC's mission statement is similar to the future 
statement of UTC-OL. Similarly to UMUC-OL, UTC-OL 
must strive to reach a large number of students, locally, 
regionally, and worldwide. As such, privacy concerns of 
UTC-OL are similar to those of UMUC-OL. Such concerns 
are raised due to potential violations of privacy as a result of 
information storage and communications in course delivery 
systems, online library resources, online registration services, 
online financial transactions, and online advising. Also, UTC- 
OL must make every effort to protect the privacy of students, 
faculty, and staff following federal, state, and local privacy 
laws. 

V . FERPA and Other Privacy Laws at Institutions 
of Higher Education 

Traditional and online education must comply with the fair 
information practices (FIP), which provide students and faculty 
with control over the disclosure and use of personal 
information. FIP also states organizational obligations for data 
protection. As a result, FIP provides the basis for both privacy 
laws and self -regulatory programs. Regulations related to 
higher education include FERPA, the Health Insurance 
Portability and Accountability Act (HIPAA), and the Gramm- 
Leach-Bliley (GLP) Act. 

The main regulatory requirement related to cyber crime at 
an institution of higher education is the Family Educational 
Rights and Privacy Act (FERPA). FERPA is a federal law that 
applies to all schools that receive funds under an applicable 
program of the U.S. Department of Education. In order to 
protect the privacy of students and their records, the university 
and its colleges must follow FERPA requirements as applied to 
the college: 

• Parents have certain rights when children are less than 
18 years old. Since college students are typically over 
this age, the college must recognize that parents have 
transferred these rights to the "eligible students." 

• At a college level, students have the right to inspect, 
and, if needed, correct their education records. 
Students have the right to a formal hearing if the 
school does not correct the records. 

• The college must have written permission from the 
eligible student in order to release any information 
from a student's education record. 

• The college may disclose, without consent, 
information such as a student's name, address, 
telephone number, date and place of birth, honors and 
awards, and dates of attendance. However, schools 
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must allow eligible students a reasonable amount of 
time to request that the school not disclose directory 
information about them. 

FERPA requirements must be followed by administration 
officials for both traditional and online educational models. 
These requirements include the online student registrations, 
online access of records, online advising office, and online 
access to academic programs. For example, both traditional and 
online models must pay special attention to grade posting. 
Some faculty members discuss or send grades to students via 
email. An unauthorized interception of the email or an 
inadvertent release to an unauthorized person can result in a 
violation of FERPA. Other faculty members post their grades 
online or in a public space. Even when student IDs or names 
are removed, posting the grades in alphabetical order may be 
considered a violation of FERPA requirements. The best way 
to post grades is to use approved platforms such as Blackboard, 
Webtycho, or Banner. 

Online and traditional institutions of higher education must 
pay special attention to privacy issues related to other laws and 
regulations. For example, the GLB Act can be implemented in 
the financial aid and bursar's offices and HIPAA can be 
implemented in the student health center or in the office for 
students with disabilities. The GLB Act requires organizations 
(and universities) that offer loans or other financial products to 
consumers (and students) to explain their practices and 
information-sharing policies in order to safeguard sensitive 
data. HIPAA is designed to improve continuity of health 
insurance coverage in the group and individual markets, 
including student markets, to combat fraud, waste, and abuse in 
health insurance and health care delivery. 

The state where the university is located also has approved 
regulatory policies that mitigate cyber crime. Since the 
university operates within the state, it must follow such 
legislation. For example, UTC-T and UTC-OL reside in the 
state of Tennessee, where the most relevant laws are the 
Tennessee State Law for Personal Information Breach 
(TSLPIB) and the Tennessee Computer Crime Act (TCCA). 
TSLPIB defines a breach of security systems as an 
unauthorized acquisition of unencrypted computerized data that 
materially compromises the security, confidentiality, or 
integrity of personal information maintained by the information 
holder. Any direct or indirect access of computer resources for 
the purpose of obtaining money, property, or services is 
considered to be a violation according to TCCA (University of 
Tennessee Security Policies, 201 1). 

VI. Suggested Changes to Improve Implementation 
of Privacy Laws 

This section discusses several policies and cyber-related 
changes, which can be used to improve the implementation of 
privacy policies for students, faculty, and staff. These 
suggestions are useful for traditional, hybrid, and online 
models. The goal of these suggestions is to enhance the 
protection of privacy in the existing models (UTC-T and 
UMUC-OL) and design policies and guidelines for the new 
model (UTC-OL). 
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A. De-Balkanization of Cyberspace 

The implementation of FERPA and other privacy laws has 
created silos of information. These silos are supposed to 
prevent information sharing and as such "protect" information 
about education, financial, medical, or other sensitive records. 
However, Cyphert and Garbutt [4] argue that while information 
silos make sense from a legal perspective, the lack of sharing 
can have a negative effect when dealing with troubled students 
or employees. If financial information is combined with 
information about classroom performance, behavior, or 
attendance, then an indication that a student is suffering from 
depression can be suggested. Separately, in a Balkanized 
cyberspace, it would be difficult to identify any future 
problems with disgruntled students or university employees. 



Just as in many other universities, UTC currently uses 
information systems that do not "talk" to each other. Such 
Balkanization of student support services at UTC requires that 
students complete several FERPA forms, which then are 
distributed to each specific service. Also, typical to online 
established systems, UMUC-OL has been able to integrate 
several services in a single interface. It is strongly suggested 
that the UTC-OL model integrate protection of privacy through 
secure information systems and protect the security of students 
and faculty through an integrated information system. 

B. Copyright Protection of Course Contents and Intellectual 
Property 

Talab and Butler [5] provide several suggestions for 
traditional universities to enforce the protection of copyrights. 
Specifically, they suggest that every student and faculty must 
assume that content is copyrighted unless it states otherwise. 
Also, every user must read the terms of use for each file- 
sharing site and provide recognition or references for any 
citation used. Universities must also provide clear copyright 
policies related to the development of online content. In the 
case of online education, a great source of guidelines can be 
found in the TEACH Act Toolkit available at 
http://www.lib.ncsu.edu/scc/legislative/teachkit/overview.html. 
Another good source of information about intellectual property 
and copyright protection for online education can be found in 
the Educational Multimedia Fair Use Guidelines (EMFUG) at 
http://www.utsystem.edU/ogc/intellectualproperty/faculty.htm# 



C. Management of Sensitive Student and Faculty Records 

All universities must store and protect sensitive student and 
faculty records. In addition, online universities have the added 
challenge of not only academic sensitive information, but often 
times the electronic delivery of this information. Rakers [6] 
suggests specific steps to control sensitive data and 
information. These steps can be used for both face-to-face and 
online environments. The suggested steps include an 
institution-wide commitment to security and privacy, 
implementation of tools to ensure protection in all areas, and 
education and active involvement in the privacy rights debate. 
Other suggested steps include the use of a dedicated security 
team to protect university data, harden the software and 
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hardware against potential vulnerabilities, and mitigate risk by TABLE II. 

buying insurance coverage. 

UMUC-OL provides an excellent example of how an online 
university can add to the above steps. Specifically, UMUC-OL 
protects sensitive information by physically securing servers 
that store content specific information, encrypting sensitive 
information that is transmitted electronically, and using 
authentication procedures to make sure that every individual 
who claims UMUC-OL association establishes their identity. 

D. Encryption of Sensitive Information 

Traditional and online universities are continuously shifting 
toward digital storage and electronic communication. From a 
technical perspective, encryption is a necessary measure to 
ensure that even when sensitive information falls in the wrong 
hands, the hacker is not able to understand it. Tables 1 and 2 
provide a list of encryption technologies as suggested by 
Fritsche and Rodgers [7] . 
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File, Folder, and Virtual Disk Encryption Used by 
Universities Used by Universities 



TABLE I. 



Disk Encryption Software Used by Universities 



Software 


Supported 


Install 


Supported 


Retail 




Platforms 


Time 


Storage 
Devices 


Cost 

(Single 

User) 


WinMagic 


WinXP 


72 


Flash 


$129 


SecureDoc 




min 


Drive, 




4.2 






USB Hard 
Disk (I,E) 




PGP Whole 


WinXP 


82 


Flash 


$149 


Disk 


MacOSX 


min 


Drive, 




Encryption 






USB Hard 

Disk (I,E) 




Pointsec 6.0 


WinXP Linux 


135 
min 


Hard Disk 
(I,E) 


$149 


DriveCrypt 


Windows 


78 


Flash 


$161 


3.5 


XP/NT/2k 


min 


Drive, 
USB Hard 
Disk (I,E) 




Utimaco 


Windows 


73 


Flash 


$240 


SafeGuard 


XP/2k/Server 


min 


Drive, 




4.2 


2003 




Hard Disk 

(I,E) 

SdCards 





It is suggested that online course delivery systems 
implement several network security improvements, including 
an effective campus-wide firewall strategy, secured wireless 
networks, and IP-source spoof protection software. Just as 
UMUC-OL, UTC-OL must also use a Virtual Private Network 
which encrypts data in transit. 

E. Third Party Verification and Authentication Approach 

Online universities may also partner with a third party 
corporation for the verification of online student identity. A 
third party corporation can employ "data forensic techniques," 
similar to techniques used in the financial service industry. 
Such an approach enhances the credibility of the online 
education process [8]. 



Software 


Platforms 


Encryption 
Algorithms 


Cost 


Windows 


Windows 


Data 


Part of 


EFS 


2000, XP 


Encryption 


Windows 






Standard 


XP/2000 






(DESX), 








Triple DESX 




AxCrypt 


Windows 95, 
98, ME, NT, 
2000, XP 


AES 128 


Free 


TrueCrypt 


Windows 


AES 256, 


Free 


4.2a 


Linux 


Serpent, 
Twofish 




DriveCrypt 


Windows 95, 


AES, Triple 


$77.34 




98, ME, NT, 


AES (768) 






2000, XP 


Blowfish 
256, 448, 
Triple 
Blowfish(134 

4) 






Windows 


Blowfish 


$25 


CyberAngel 


95,98,ME,NT, 


128, AES 


(Software) 




2000.XP 


128, 256, 448 


$60/yr 






Two-fish 


Maintenance 






128, 256, 








Standard 








DES and 








Triple DES 





VII. Conclusions 

Colleges and universities are using information technology 
as an effective tool to achieve their mission statements and 
strategic goals. Such dependence on the technology has 
created potential concerns with respect to privacy of students, 
faculty and staff records. Sensitive information may be 
exposed to potential threats of cyber crime. 

This paper focuses on privacy concerns and policies in 
higher education using a mix of traditional and online models. 
The cases of UTC, UMUC-OL, and UTC-OL show the 
necessity of addressing privacy issues in order to achieve 
institutional mission statements. A detailed analysis of FERPA 
and other privacy laws illustrates the dual and contradictory 
impact of online technology on privacy matters. 

The expansion of education models from traditional face- 
to-face to online is associated with increased concerns for the 
privacy. At the same time, the paper concludes that technology 
can be used to address such privacy concerns. Specifically, the 
paper recommends the use of encryption and third party 
verification and an authentication approach to protect the 
privacy of students, faculty, or staff. Other recommendations 
include de-Balkanization of information in student services, 
copyright and intellectual property protection, and a better 
management of sensitive information. 
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Abstract — In recent years, the various processes and with 
different aspects were created in order to response the needs of 
producing secure software. As a result of secure methodologies 
production to develop software, their evaluation is difficult to 
select a methodology for a specific project. Due to This need, a 
framework for evaluation and measurement methodology plays 
an important role. Evaluation must consider the various 
parameters of the software project and pay attention to the 
similarities, differences, features and application of available 
methodologies. So there is need for measures to cover these 
requirements. Despite the researches in the analysis and 
evaluation of secure methodologies for software production in 
various aspects, the lack of an overall framework for evaluating 
these methodologies is seen. 

In this paper, we offer an extended framework for evaluating 
the secure methodologies that cover different aspects of a software 
development methodology. This framework will be considered the 
needs of project managers and method engineers in order to 
choosing an appropriate secure methodology in the desired 
project. 

Keyword- Methodology; Framework; Evaluation; Security; 
Criteria 



I. 



Introduction 



Software development methodologies are a framework to 
establish software engineering activities with the aim of 
providing required fields for building software systems. 
Software development Methodologies consists of two main 
elements are as follows [1,2]: 

Processes of Software development: processes define a 
targeted framework for engineering activities to develop 
software products. So, a process determines the sequence of 
processes, products in the activities, responsibilities of 
individuals and teams during product development and criteria 
for monitoring and evaluation of products and activities of the 
project. 

Modeling language: It consists of a set of grammatical 
rules and semantic for modeling the software. 

In software projects, methodologies are evaluated and 
analyzed more precisely using appropriate criteria until the 
methodology are chosen based on the needs of organization. 
Methodology Evaluation can cause a better understanding of 
various aspects of the methodology [3]. Evaluation issues as a 
means to compare the methodologies and help the user to select 
the optimal methodology from several methodologies. Also, 



the evaluation results can accelerate development and enhance 
the methodologies [3]. As a result of the used methodologies to 
produce secure software and the researches in the analysis and 
evaluation of secure methodology in various aspects of 
software development, the lack of an overall framework is seen 
for evaluating the methodologies. So in this paper, we will 
present a framework and criteria for evaluating these 
methodologies and cover the Project Leaders' requirements in 
order to choose an appropriate methodology. 

In the second part of this paper, we will explain the related 
work in this field and in third part, we will explain three 
features of a comprehensive framework for the evaluation 
should be had. In the fourth part of its proposed framework and 
methodology of the fifth to secure the CLASP using the 
proposed framework, we will. 



II. 



RELATED WORK 



SSE / CMM Infrastructure [4] is a reference model for 
processes that explains security aspects at different levels of 
evolution the processes and provides methods for evaluating 
the activities of security. Implementing security in a system and 
its associated sub-systems focuses on the requirements. The 
problem of this model is that does not dictate the use of a 
specific process in the organization. Its goal is the use of model 
in the processes of the organization. This makes difficult the 
understanding and implementation of SSE / CMM [5]. 

OCTAVE is a security system based on strategies and 
techniques. It has a comprehensive and systematic approach to 
information security risk evaluation and enables the 
organization's components for understanding and positioning of 
information security risks. OCTAVE'S problem is delivered in 
this sector, as a self-management approach is considered. 
Organization's individuals are responsible for the organization's 
security strategy and it is possible, this work is not performed 
properly because of reasons such as lack of sufficient security 
knowledge and entrusting other responsibilities to them. [5] 

ISO / IEC 15408 Standard provide a group of criteria for 
evaluating product security. In the past, there were some 
security features that a software product runs it. Recently, there 
are Functions has been implemented in the development 
process and ensure and validate that the software development 
will be safe. This standard provides criteria which users can 
implement security requirements in their products by them and 
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assessors can evaluate the claims made by manufacturers about 
their products [6, 7]. The problem of ISO / IEC 15408 standard 
is complex to implement and evaluate the security aspects of a 
software product. This standard requires certain knowledge that 
takes the cost and time [5]. 

ISO / IEC 27002 Standard has been created for the 
confidentiality, availability and accuracy of the information, of 
security controls can be achieved With its implementation 
which ensures the defined security objectives have been met. 
The problem of ISO / IEC 27002 standard is that it includes a 
large number of security controls that a description is not 
explained in relation to best implement these security controls 
in standard [5]. 

The PSSS process [7], Common Criteria [8] and NIST [9] 
have presented a method for evaluating security activities. 

According to the listed models and standards, everyone 
presents methods to assess security activities, but neither of 
these models has not proposed a comprehensive framework for 
evaluating the secure methodology. 
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that cover all aspects respectively. The obtained Criteria set as 
the was considered a basic set and completed in later stages. 



III. 



EVALUATION FRAMEWORK 



For evaluating all basic aspects of the secure methodology 
and achieving to a comprehensive evaluation, We define the 
criteria for each of the main aspects .Therefore for every aspect 
of the evaluation criteria should be considered or we express 
the goal of criteria. Also In order to quantify the evaluation, we 
express possible criteria in quantify state and where, there is no 
standard method to quantify, we define the accurately method 
to evaluate measure and the possible values for the evaluation 
outcome. 

Understanding the application scope of methodology 
require to define appropriate criteria with using of 
methodologies in action. According to main aspects of the 
methodology, it will be covered. Although this aspect may 
require more extensive measures are being. 

Each of evaluations needs to all or part of the evaluation 
framework is defined based on the goals that follow them. For 
example, to examine and compare the available security 
features in a secure methodology, goal of the evaluation is 
analyzing the security and the criteria of methodology's main 
aspects will be requirements and used criteria. 

IV. PROPOSED EVALUATION FRAMWORK CRITERIA(EFSm') 

Collecting criteria has been done with defining a basic set 
and gradual improvement. It has used a basic set of defined 
fundamental aspects to define and the first step for each of the 
fundamental aspects, independent of other aspects, one or more 
criteria were defined. With putting together the criteria that are 
defined independently, we will have a set that has problems 
such as lack of completeness, inconsistencies, duplication and 
overlap with each other measures. 

Therefore, in the second step, with solving the above 
problems, we purify the obtained set and measures set is gotten 



In the next step, we defined some practical scenarios, and 
applied the criteria set for evaluation and was trying to apply 
them in practice, obtained the shortcomings and defects. After 
applying any of the scenarios, the criteria set was completed 
according to overlapping and conflict. After the scenarios, the 
criteria set was achieved to sustainability and it was in a good 
level and spanning. In the last step, the final set of criteria has 
been formulated with regard to structure and it was used to the 
final evaluation for secure software development methodology. 

Set of evaluation criteria are in five categories at the highest 
level: process, modeling language, security, applications and 
pervasive. Each of these criteria includes the following sub- 
categories of criteria, which include criteria for the evaluation 
of the approach under consideration sub-category. Figure A 
shows the classification criteria. 



Figure 1 . Classification criteria for evaluate 
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A. Process Evaluating Criteria 

The criteria of this category focus on evaluating 
development process of methodology with different 
approaches. These criteria are divided into four categories: 
Definition; it evaluates definitions criteria of process in terms 
of attributes that definition of a process that should be had. 
Production; it evaluates and reviews the cycle of production 
and output characteristics of the process. Needs; it reviews and 
evaluates important issues in requirements engineering. 
Features; it considers development standards in this process 
fully. TABLE I shows the evaluation process, the first column, 
criteria, in the second column defined criteria, in the third 
column the range of criteria values and in the fourth column, 
evaluation of CLASP methodology [10, 11] as a methodology 
for secure software development using The proposed 
framework is described. 



Evaluation Framework for Secure Methodologies 
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TABLE I. Process Evaluating Criteria 



Criteria 


Defined criteria 


Range of values 


CLASP 


Definition 


Clear and 
unambiguous 


Is the production process defined as a clear and 
unambiguous? 


Yes/No (specify why) 


Yes 


Logical 


Is the production process logical in terms of providing a 
broad description or very minor? [12] 


Yes/No (specify why) 


Yes 


Complete 


The full definition is a definition that includes the basic 
components of the production cycle, roles, activities, 
language modeling, outputs / products, techniques / business 
practices, umbrella rules and activities. 


Real number greater than zero 
and less than or equal to one 


7/8 


Production 


Cover the General 
steps of the production 
cycle 


Which stages of the production process of the general cycle 
of identification, analysis, design, implementation, testing, 
commissioning, maintenance, support and close the project 
does development process cover?[3] 


Real number greater than zero 
and less than or equal to one 


5/9 


Production line 


How is The applicable policy to the production of outputs 
(software)? 


Iterative, incremental, fast and 


Iterative, 
incremental 


Enough 


Does development process provide Typical output of the 
production process of public activities, including feasibility, 
describe the requirements, design, modeling, documentation, 
testing, training and commissioning? [13] 


Real number greater than zero 
and less than or equal to one 


4/8 


Coordination among 
the products 


Is there a Coordination and logical relationship between 
products and Are they the complement each other? 


High, medium, low 


Medium 


Standards 


Is there a standard for producing outputs and products? 


Yes (Standard), No 


Yes (write classes) 


Needs 


Identification method 
of needs 


How are Software needs collected in process ?[3] 


associated Activities with the 
identified needs, the roles 
involved and their output 


By capabilities that 
produce of first 
phases 


format Description of 
needs 


How are the requirements expressed? 


Component, Usage scenario, 
User story, Feature, Use-case 


Feature 


Based on operational 1 
non-operational needs 


Is The production process based on needs? 


Yes (techniques), No. 


Yes (production 
process is Base on 
needs recognition) 


Changing needs 


Does the process support of changing the needs? 


Yes (techniques), No. 


Yes (review needs) 


prioritization Methods 
of needs 


What are the criteria for prioritization of needs? [13] 


Architectural value, functional 
value, business value and risk 
of implementation 


risk of 
implementation 


Features 


Size 1 complexity 


The relationship between size / complexity of building 
blocks are defined as a function of the production process 
(process, outputs, work procedures, activities and roles) 


Integer greater than zero 


(number of roles + 
17) 


completeness 


How much completeness is the Production process in terms 
of definition, , the coverage of public processes, and 
insufficient production cycle of products / outputs to the full 
extent? 


Real number greater than zero 
and less than or equal to a 


(4/8+ 5/9 + 7/8)/3 


Be applied 


Is the production process practical? 


High, medium, low 


Medium 


Ability to apply 


Is the production process applicable? 


High, medium, low 


Medium 


Documentation 


Are The production process and the activities undertaken 
during the implementation process documented and provided 
to software engineers? 


Yes , No 


Yes 



B. Modeling Language Evaluating Criteria 

Since the modeling language in safe methodologies is less 
important, these criteria are expressed as compression at four 



subcategories and concern to evaluate the modeling language 
features. Having a similar structure to TABLE I, TABLE II also 
express evaluating criteria for modeling language. 



TABLE II. Modeling Language Evaluating Criteria 



Criterion 


criterion definition 


Range of values 


CLASP 


Simplicity 


Is it easy to learn and use modeling language? 


Yes, No 


... 


Power of Language 


Is the modeling language powerful? 


Yes (for example), No 


... 


Techniques to resolve 
conflict 


Has the modeling language provided any techniques to resolve conflicts 
between the models?[3] 


Yes (techniques), No 


— 


Methods for managing 
complexity 


Has the modeling language provided methods to manage the complexity? 


Yes (methods), No 


... 
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Security Evaluation Criteria 
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This criteria category, evaluate security features of 
methodology in various phases of product life cycle. TABLE 
III shows the security evaluation criteria. 



TABLE III. Security Evaluating Criteria 



Criteria 


criterion definition 


Range of values 


CLASP 


Software Security 
Training 


What level of security training will be given to project team members? 


High, medium, low 


Medium 


Security Information 


What level of security Information is there in security requirements as 
a function of security analysis results information, information of 
product vulnerabilities discover in gander solving, software security 
events Information and changes information? 


Real number between zero and 
one 


3/4 


Analysis of security 
requirements 


Which of the security requirements are analyzed and identified? 


Environment, functional, 
software development process 


Environment and 
functional 


Apply Software security 
principles in design 


To what extend software security principles are applied in design? 


High, medium, low 


High 


improve the security of 
design 


What can be done in order to improve the design (of security)? 


design evaluation, internal and 
external review of design (of 
security) .design simulation 


Design evaluation 


Use security tools of 
executable program 
generation 


What activities can be done to use of security tools generate 
executable programs? 


Identification of security tools, 
use them correctly, function 
accuracy control 


Identification of 
security tools and 
use them correctly 


security monitoring After 
product installation 


Is the product constantly monitored by security After installing? 


Yes, No 


Yes 


Security Response 


To what extent does it show a Appropriate response to the security 
problems? 


High, medium, low 


High 



D. Application evaluation criteria 

The criteria of this category which are defined in six 
subcategories, note to the most important aspects of 
methodologies application: Project Characteristics; this group 
of criteria, consider the parameters related to the project which 
have the most use for selecting the appropriate project 
methodologies and investigate them on methodology. Team 
Work; these criteria, investigate potential deals of working 



group. Technical and Managerial; this group of criteria, 
investigate and evaluate technical and managerial capabilities. 
Umbrella Activities; investigate the umbrella activities in 
methodology which are critical in real world use of 
methodology. Fitting Methodology; these criteria, consider the 
main points in the manipulation and adjustment methodology. 
Documentation; these criteria, evaluate methodology for using 
guide documents and using experiences reports. TABLE IV 
shows the application evaluation criteria. 



TABLE IV. Application Evaluating Criteria 



Criteria 


Criteria definition 


Range of values 


CLASP 


project 
characteristics 


Project size 


What is the size of the project? 


Large, medium, small 


Large 


Domain 


What are the fields of project use? 


System, prompt, business, 
engineering and scientific 
and ... 


System, business, 
engineering and 
scientific 


Dynamism 


How much is the change percentage in demands in a 
month? 


Real number greater than 
zero and less than or equal to 
one 




Complexity 


How much the computational complexity is? 


High, medium, low 


Medium 


Project Priority 


What is the main goal of the project?[14] 


Productivity, observation 
capability, repeatability, 
accuracy, reliability, 
compatibility, quality and 
security 


Security 


restrictions 


What are the particular restrictions of project? 


Restrictions name 


.... 


Team work 


Team size 


How many people are there in production team?[15] 


Person 


.... 


Level of training 


What level the training is on? [3] 


3,2,1 B, 1A, -1 


IB 


Level of experience 


What level the experience is on? 


High, medium, low 


Medium 


Skills in project scope 


On What level the skill in the project is? [15] 


High, medium, low 


Medium 


Technical and 
managerial 


Style of programming 


What is the style of programming? 


Simple, complex 


.... 


Abstraction mechanism 


What is the mechanism of abstraction? 


Object-oriented, agent- 
oriented and ... 


— 


Procedures for testing 
and debugging 


What procedures exist for testing and debugging? 


Unit testing, compound 
testing, and acceptance 
testing. 


Testing security 


Size of management team 


How much is the management team size?[16] 


Large, medium, small 


Small 


project manager 
experience 


On What level the Project manager experience is? 


High, medium, low 


High 
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Team management 
approach 


What is a team management approach? 


Central, distributed 


Distributed 


Working culture 


How is the work culture?[16] 


cooperative, Partnership, 
non-cooperation 


Cooperative and 
Partnership 


Umbrella 
activities 


Project Management 


What level of process management activities 
(planning, scheduling, monitoring, and reviewing the 
production process) can be supported? 


Real number greater than 
zero and less than or equal to 
one 


4/4 


Configuration 
management 


To What extend approaches and tools of software 
configuration management (SCM) are supported? 


The supporting level of SCM 


Incident 


Team management 


Has The methodology provided a procedure for 
teams and people management? 


Yes (techniques), No. 


Yes (certain works 
teams) 


Quality Assurance 


Does the methodology support the quality assurance 
techniques to? 


Yes (techniques), No. 


Yes (revision 
patterns and 
evaluation 
conclusion 




Risk Management 


Does the methodology support the risk management 
techniques? 


Yes (techniques), No. 


Yes 

(schematization 
and development 
report) 


Fitting 
Methodology 


Compliance / 
compatibility 


Whether the methodology provides consisting itself 
for the project needs? 


Yes , No. 


Yes 


Flexibility 


Whether the methodology support changes in 
process modeling language during the run? 


Yes (How), No. 


Yes (Review 
schematization and 
process) 


Extend 


Is the methodology extendable and Has it provided 
extend points? 


Yes (the expansion), No. 


— 


Merging with other 
methodologies 


If the methodology was flawed and needs to complete 
some aspects, Has methodology provided methods 
for merging with other methodologies? 


It is not necessary, it is 
necessary but not provided, it 
is necessary and provided 


It is not necessary 


Documentation 


Primary Documents 


is methodology training Documentation available? 


Yes, No 


Yes 


Experimental Evidence 
and Reports 


is There Experimental evidence documents of 
practical use of the methodology? 


Yes, No 


Yes 



E. Comprehensive evaluation Criteria 

Comprehensive criteria, criteria which are defined on one 
or more aspects of process, modeling language, security and 



application or in general on methodology itself and evaluate 
easily by using the results of evaluating other aspects of 
methodology. TABLE V shows the Comprehensive evaluation 
criteria. 



TABLE V. Comprehensive Evaluating Criteria 



Criteria 


Defined criteria 


Range of values 


CLASP 


Performance 


How much is the methodology performance as a function of team size , 
number of outputs, the number of involved roles and the speed of production 
team in each repetition? 


Real number greater than 
zero 


This criteria depend 
to assessor idea 


Usability 


How much is the Usability As a function of the number of guidelines, roles, 
consistency level of methodology with the scope and lean production process? 


Real number greater than 
zero 


This criteria depend 
to project 


Completeness 


How much is methodology Completeness based on the function of process 
completeness, umbrella activities coverage and determine a modeling 
language? 


Real number greater than 
zero and less than or equal to 
one 


(0.64+0.8+0) = 0.48 


Methodology 
status 


What is the current methodology's status?[17] 


Young, growing, active, set 
aside 


Developing 


Production 
process 


How is the methodology defined in the production process? 


Explicit, implicit, undefined 


Explicit 


Restrictions 


What restrictions on the use of the methodology are effective? 


Restrictions name 


.... 



V. CLASP EVALUATION WITH EFSM 

In the CLASP process, the Others Successful experiences 
and guidelines have been used well in software development 
and are categorized as Activities and each activity steps. 
CLASP has considered different aspects for production and 
development secure software. 

Lacks of work products definition which at each activity 
and its steps have to be produced are of major defects of 
CLASP. Implementation steps which has been introduced in 
each activity, sometimes are very general and the others 
successful experiences have been advised and referred in that 
field. In addition to the lack of precise descriptions of work that 



must be done at each step, it cause the lack of integrity in the 
work products which produced by the development team and 
its first loss is removing seamless properties and follow up 
ability on this process. Also not providing a integrated 
modeling language in this process, make the problem doubled 
and the system developers have to use various methods for 
describing and modeling in various activities and the 
implementation steps. 

VI. Conclusion 

The development in use of secure methodologies to 
produce software has made the selection of appropriate secure 
methodology in specific projects difficult. Analysis and 
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evaluation of these methodologies can lead to a better 
understanding of various aspects of the methodology. Also, the 
results of the assessment can speed up expanding and 
enhancing in methodology. In spite of researches in analysis 
and evaluation of secure methodology in various aspects of 
software production, lack of overall and proper framework for 
evaluating these methodologies still exists. 

In line with these requirements the EFSM framework was 
provided. This framework was created by using criteria 
collecting and integrating them and also by considering the 
required security activities in software production. The EFSM 
Is a multifaceted and structured framework, which aims to 
provide quantitative analyzing and evaluating safe 
methodology. This framework helps project managers and 
methods engineers in selecting the appropriate project method 
by evaluating safe methodologies. It also provides the criteria 
sets in a hierarchical form so with desired details consideration, 
the criteria repair capabilities and easy to understand and use it 
for the user will be provided. 

The EFSM can be set as a reference shared model to 
estimate the maturity and ability to apply of secure 
methodologies. By considering general aspect of methodology, 
it can be argued that the superset criteria set framework which 
are proposed are from examined criteria set frameworks. 
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One of the most challenging in extreme Programming is 
composing the entire team member and customer onsite. This 
problem will become seriously when the entire team member 
unavailable in the same place or the customer cannot give 
representation person for the development team. This situation 
will make information imperfectly for both customer and team 
member. In this research, we solve the problem by implementing 
computer support cooperative work (CSCW) as a tool to improve 
extreme Programming method. By joining these two concepts, 
we get 15% productivity improvement as a ratio between XP 
projects with CSCW and without CSCW. 

eXtreme Programming, CSCW, Software Development. 



I. 



Introduction 



Software development is a resource-limited cooperative 
game of invention and communication. The primary goal of the 
game is to deliver useful, working software [4]. Software can 
be developed by individual or team. When software is 
developed by the team, complexity of the communication can 
raise misunderstanding of the software itself. With the 
increasing need to collaboration work, programming work has 
had more and more of social components [3]. These social 
components consist of information sharing, resources sharing, 
and experience sharing. Table 1 illustrates the example of the 
social components in software development. 



TABLE I. 



Social Components Examples 



Social Components 


Examples 


Information sharing 


Team member shares the 
requirements to the others peers. 


Resource sharing 


Team member shares the third 

party component or article for the 

peers to tackle development 

problem 


Experience Sharing 


Team member shares the 

knowledge base for specific case 

that already solved in the past. 



Social components might be imperfectly shaped when a 
team member is unavailable in certain moment. For example, 



when a member cannot attend the stand-up meeting, the 
member will lose some of the information as well as the other 
peers have to explain the member in the other day to make him 
know the project progress. 

eXtreme programming (XP) makes great use of osmotic 
communication, face-to-face communication, convection 
currents of information flow, and information radiators on the 
wall [2]. This will become challenging when the team member 
has great mobility and information itself has quick lifecycle. 
Quick information lifecycle can be seen when a member write 
the information on the whiteboard in this day, and tomorrow 
the information is erased because the new information is come. 
The team member who does not see the information will also 
lose this information. 

The impact of losing information is happening both in 
client and development team. When the client is too busy or 
they do not know what they really need. The software that 
delivered will fade away from the expected. Many clients take 
the word software development as simple approach of code 
construction. Therefore, some of them feel that they do not 
have to always onsite with the team development. Although 
Extreme Programming said it strongly needed, rarely client 
who want seat side by side every time. 

Our idea to solve this kind of situation is provide the team 
as well as the client, a tool that support for collaboration and 
cooperative work. This tool is well known as groupware or 
computer support cooperative work (CSCW). We are mapping 
the need of information in XP with appropriate CSCW tool. In 
this research, we contribute some of practices below: 

• We explain precisely how we setup the CCSW tool in 
line with XP information. Surprisingly, this has not 
been done before. 

• We capture information architecture in CCSW tool as 
collaborative model in Extreme Programming 

These contributions can be adopted for XP team who want 
to manage the information more precisely with the support of 
CCSW tool. 
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II. Mapping XP with CSCW Tool 



XP has four disciplines values that can be implemented in 
software development lifecycle. These values provide 
development team to learn and drive a set of software 
development practices. Figure 1 illustrates the values according 
to Beck (2001). 
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Cold communication primarily happens when the 
unavailability of the team including customer is exists. This 
kind of communication will raise some problems such as: 



r 



Communication 



Simplicity 



XP Values 



Feedback 



Courage 



J 



Figure 1 . XP Values 



Direct communication exposes a basic rule in XP team. 
When a person has knowledge about the system, he should 
share the knowledge with the others. Direct communication is 
not only about the transmitted information but also about a 
perception from the receiver. Perception can be synchronized 
through several modalities. Table II provides an example about 
direct communication modalities. 



TABLE II. 



Direct Communication Modalities in Software 
Development 



Modalities 
Models 


Examples 


Physicals 
Proximities 


In stand-up meeting, The speaker may move closer to 

indicate aggressiveness or enthusiasm. The listener 

may move closer to indicate interest, agreement, or the 

desire to speak; or, the listener may move away to 

indicate fear, disagreement, or the need to think 

privately for a moment 


Visual 


Explain the concept through 3D visual information, 

like video, slide shows, or even a scratch diagrams on a 

whiteboard 


Kinesthetic 


Team member uses sensation of movement to help 

construct a new explanation or to help improve the 

building of a question. 


Sound 


The speaker uses pitch, volume, and pacing to 
differentiate and emphasize ideas in a sentence. 


Low latency 


Extreme Programming set the good of direct 
communication since this kind of communication is 

low latency in term of request and response 

communication. For example when the peer ask the 

question the others team feel it should to get immediate 

answer or response. 



• Missing information and different nerve perception. 

• Slow motion of feedback. This happen since the 
member not engaged directly to response the 
communication. 

• Lacking of simplicity. Team should create 
documentation or retelling the problem to make the 
same perception. 

All of these problems indirectly told us that bad 
communication is the root problem to shape better XP values. 
Many of these problem is cannot avoided in mobility world 
today. 

Starting with belief of computer support cooperative work 
(CSCW) will solve the communication gap; we build a model 
that can reduce the presence problem in Extreme Programming. 
CSCW is asset of theories that are often qualitative case studies 
with thick descriptions in an effort to develop common themes 
and patterns in cooperative area [7]. CSCW is adopted as 
software tool that famously called as groupware. 

Groupware provides shared workspaces for the entire user 
that joined in the group. Shared workspaces aim at supporting 
cooperative and communication tasks. They provide users with 
a virtual space in which information can be shared and 
exchanged [8]. Shared workspace consists of some well-known 
features such as: 

• Forum discussion. This feature enable user to discuss 
specific topic in forum model. 

• Instant Messaging, this feature enable multiple user to 
chat using textual feature. 

• Sketchpad, this feature enables multiple users to 
collaborate the idea through a whiteboard like feature. 

• Shared Calendar, this feature enable shared calendar 
that can be accessed by the entire user. 

XP captures information through several forms such as 
release plan document, class responsibility collaborator, story 
card in the wall, and personal communication log. This kind of 
communication should be plotted with existing groupware 
feature. Figure II illustrates how we plot between XP 
information and Groupware features. 



These values mentions as hot communication and only 
happens in face-to-face communication. The opposite of hot 
communication is called cold communication. Requirements 
paper, email, or video are examples of cold communication. 
Cold communication means there are no need to do immediate 
action. 
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Figure 2. Relation between CSCW Features with XP Information 

Figure 2 illustrates us some fact that: 
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files. Manual versioning is done by adding additional document 
info through its name. Our naming template is codefile- 
projectname-builddate. extension. For an example if we have 
release plan document for project named Whistler that saved in 
12 January 2009, we will name it as releaseplan-whistler- 
120109.doc. There are some others approach to name the file 
like by using metadata information or add it in the body of 
document. However, we find manual versioning is simple, yet 
powerful, and no need extra processing. We apply this manual 
versioning for document, source code, and binary release. 

The second fact drives us to examine more clearly about 
direct communication log. XP comes with various shape of the 
direct communication. Started from planning game, stand up 
meeting, pair programming and postmortem in every small 
release. 

Direct Communication is the main interaction between 
customer and development team. When the customer cannot 
meet directly with the development team, CSCW tools will 
support both parties through multimodal communication. The 
detail of the multimodal communication is incrementally 
adopted with the need of communication user experience. We 
notice three aspects that needed in multimodal communication 
which are: 



• Some of the XP information is highly dependent with 
file sharing. 

• Direct communication log have greater tendency with 
some of features in CSCW. 

The first fact drives us to create a structure model in file 
sharing. The structure model in our file sharing is adopted 
through several directories that each directory has its own 
purpose. Table III show a directory structure that we are 
developed for XP file sharing. 



TABLE III. 



Folder Structure Model for XP 



Folder Name 


Functional Descriptions 


Bin 


This folder consists of all small software 
release that we already created. 


Res 


This folder consists of available resources that 
we need to tackle the problem in our software 
constructions. Books, articles, or images that 
related with the project. 


Lib 


This folder contains the sample source code, 
library, or third party component that we will 
use in the project. 


Doc 


This folder contains development 
documentation such a release plan, CRC 
cards, or user stories document. 


Temp 


This folder acts like temporary folder. This 
temporary folder can be used for backup files, 
when one or more team member, make a 
change through the courage spirits in XP. 


Misc. 


This folder contains files that are not related 
directly with the constructions process. In 
example proof of receipt, lost and found, and 
another managerial stuff. 



Project leader create this entire folder in file sharing tools 
and manage every changes of it through manual versioning 



• Presence 

• Perception 

• Interactive 

These aspects drive us to choose instant messaging, as a 
primary tools that provide which are: 

• Presence through status tag. 

• Perception through various way communications like 
textual chat, video chat, and audio chat. 

• Interactive through its instant model. 

There are two types of instant messaging collaborations. 
The first one is private model person-to-person, and the second 
one is conference model when many-to-many collaboratively 
together. Figures 3 map the instant messaging model with XP 
activities. 



Private 
Model 



Conference 






Pair 

Programming 



Planning 
Game 



Stand Up 
Meeting 



Figure 3. Mapping Instant Messaging Model with XP Activities 

We save every communication log in both communication 
models when permits and store it in folder /Misc/ in file sharing 

system. 
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In order to enrich the sharing experience we combine the 
chatting tools with sketchpad or even with screen sharing. 
Screen sharing is somewhat useful when we do pair 
programming in separated area. 

For more 'cold' captured content that need to be follow up, 
we implement forum discussions. This will give a team more 
memory to remember the flow of the information and submit 
their argument, idea, or more discussions in time manner. 
Figure 4 illustrates us the meaning of forum to provide clear 
view of the information tracker like a tree. 
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information architecture. We exhibit this by doing small 
experiences through two identical projects. We said identical 
since the projects have characteristics such as: 



If New - 




Hill View by » | flfl ' L I* <4A Oi I /^ I % % ™ *■ 


Date Subject 




U 1/28/2009 7:41PM 1 


B Release Plan, , . documentation revision 




B 1/28/2009 7:47 


D Re: Release Plan... documentation revision 




1/28/2009 


Re: Release Plan,., documentation revision 




1/28/2009 7:47 


Re: Release Plan... documentation revision 



Figure 4. Forum Provides Information Flow for the Team 



III. Mapping Model In Mobility World 

We find that the three approaches provides us more 
efficiently to guide a person both team member to always up to 
date, even they can't attend one or more meeting. For an 
example when they do not come to attend a stand-up meeting, 
they will do such as action, which are: 

• Asking the peer about what he miss 

• Searching through the computer or the code, what the 
changes happen and learn the changes from there. 

Both of the possibility action will guide a team member 
into ineffectiveness time and work. CSCW concept that we 
implemented provides a team member to do action: 

• Discovering the missing information through one 
single point, that is the file sharing system. 

• Exploring the information flow and the architecture 
through file versioning, forum, and sketchpad 

• Seeing the important pieces of experience knowledge 
through communication log. 

These actions can be combined with chatting tools, 
sketchpad, or screen sharing in CSCW feature. All of these 
features provide the team to keep in touch while on the go. 
Therefore, when the team member or the customer cannot 
reach onsite, they can still communicate through this chatting 
platform. 

Smoothing the step of the CSCW as a support tool in XP 
provides us some basic understanding that the CSCW tools 
should: 

• Proficient to map between XP practices with the 
suitable tools 

• Proficient to facilitate direct communication action that 
exist in XP through its features 

Beside the means of the tools, we also think the 
management process to build, maintain, and trace the 



• Having same complexity in terms of time, resources, 
cases, and budget. 

• Having same client, since this project is a part of the 
long-term contract with the client. 

• Doing by identical team members in term experience, 
and skills. 

We see a great successful rate by using manual versioning, 
as well as discipline to save every conference log into file 
sharing. We tabulate five interesting cases that reflect the 
interaction between team and client, there are: 

• Initial management setup, it is a time for each team to 
setup and create baseline workspaces. 

• Miscommunication in stand-up meeting, this case 
somewhat happen when team member or client have 
different perception in term of concept or vision. We 
capture this since this is one of the core problem that 
raises when member don't have 

• Lost and Found is a term for us that explain the 
searching and finding related information that not 
explicitly exist in release plan document or user stories. 

• Building wrong releases is happen when the member 
build new code in old or unstable release of software. 
This happen when member doesn't get current source 
code this can be happen when they miss the latest 
email, don't download the latest source code, or miss 
the last stand up meeting. 

• Complaint received from the client is happen when the 
client feeling misperception the result with their 
expected. In our case, the client is in offshore so the 
client is not in perfect condition to always onsite with 
the team. 

Table IV provides our result in this experiment to handle 
two identical projects in complexity with or without CSCW 
tools. 



TABLE IV. 



XP WITH OR WITHOUT CSCW 



Case 


Project A 

(with 
CSCW) 


Project B 
(without 
CSCW) 


Percentage 
Ratio A/B 


Initial Management Setup 
(hours) 


4 


1 


-75% 


Miscommunications in 
stand-up (cases/project) 


3 


7 


+42% 


Lost and Found (cases/week) 


1 


4 


+25% 


Building in wrong releases 
(cases) 


1 


3 


+33% 


Complaint received from the 
client (cases) 


3 


6 


+50% 


Means 






+15% 



We quantitatively count the ratio between project A and 
project B. We investigated that Initial management setup 
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provides a high time consumption when using CSCW, this 
happen since every member should prepare, and configure the 
tools. We investigate that they should configure and setup this 
followings actions: 

• Setup the software for each workstation. 

• Configuring network connection for CSCW tool in 
each client. 

• Creating and configuring model for CSCW tool like 
create folder structure and enabling the specific feature. 

When setup completed, then the software lifecycle is started 
and XP is applied. This is the time when an interaction 
sometime creating miscommunication, lost and found, building 
wrong releases, or even complaint from customer. Table IV 
shows us that CSCW gives more flexible way to: 

• Making same perceptions through some facts from 
CSCW. 

• Finding information faster and better through 
structured management folder. 

• Making the communication between peer and client 
more structured and logged for better communication 
and traceability. 

Quantitatively we average that the productivity gain can be 
achieved about 15%. This number can be achieved by 
averaging the percentage ratio for each case that inspected. The 
number provides our first class result that integrated CSCW in 
XP can enhance the productivity of mobility communication. 

IV. RELATED WORK 

Our work is focused in accelerating software development 
through the mean of better collaboration tool that support 
member mobility. The result is somewhat have same impact 
with Augustin, et al research [1]. They work to get better result 
through Collaborative Software Development. However, our 
subject and the method are somewhat different. 

Our work also research about the meaning of instant 
messaging technology in XP methods. We find that the 
investigation is proposing some management session model 
that focused in XP methods. This result gives as same pattern 
that Hansen and Damm [6] did in their investigation for context 
aware instant messaging, this result is best combined with the 
wiki collaboration in collaborative software development [9]. 
This wiki structures we find valuable and we adopt it as file 
structure model in our XP methods. All of the features that we 
implemented are encapsulated by the term workspaces. The 
good workspaces organized concept is also adopted from 
Rubart et al research [8]. They organized enterprise workspaces 
through component-based cooperative hypermedia. 

Mobility and interaction term sometime provides us a small 
conflict between team, such a miscommunication and different 
perception. Many of this mislead happen since the peer is 
limited in information than the other. Limited information can 
be happen when the user miss the meeting or in mobility. We 
see a lot useful information about conflict in collaborative 



(IJCSIS) International Journal of Computer Science and Information Security, 

Vol. 10, No. 02, 2012 
software development [5]. We adopt some of their concept in 
our mapping between CSCW roles and XP. 



V. CONCLUSIONS and FUTURE WORK 

We create the mapping model between XP methods and the 
CSCW tool. We map some of the CSCW features with XP. We 
discover that versioning in folder structure; capturing 
communication log through instant messaging as well as store 
cold information in discussion forum can be good practices to 
tackle mobility problem. In order to support our model, we 
create some field experiment that shows the productivity 
different and low conflict by using CSCW tools. 

Our model is still lack of assessment, as well as evaluation 
model. The integration model that we delivered is also 
insufficient in detail actions. For example, we do not 
investigate further experience for CSCW implementation in 
pair programming practices. We see this can be a topic for 
further research. 
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Abstract — Most signature-based antivirus products are 
effective to detect known malwares but not unknown malwares 
or malwares' variants, which make them often lag behind 
malwares. Also most antivirus approaches are complex for two 
reasons. First, lots of malicious and benign codes as training 
dataset are difficult to collect. Second, they would consume lots of 
times when training classifiers. 

Immunity PE Malware Detection System (IPEMDS) was 
designed to give computer systems PE homeostatic capabilities 
analogous to those of the human immune system. Because the 
constraints of living and computational systems are very 
different, however, we cannot create a useful computer security 
mechanism by merely imitating biology. IPEMDS approach has 
been first to choose a set of requirements similar to those of the 
immune system. It then created abstractions that captured some 
of the important characteristics of biological homeostatic systems 
and then used these abstractions to guide the design of two levels 
of defense called them IPEMDS. 

The goal of IPEMDS are to obtain high detection rate and a 
very low false positive. IPEMDS enter in a challenge to a chief 
this goal from depending only on a finite numbers of benign files 
to classify between a new benign and malware executable files, 
and both of them unseen before by IPEMDS. 

Keywords: Heuristic analysis, Packed Executable, Homeostasis, 
Dentritic Cell Algorithm (DCA), Toll-like Receptors (TLR), Global 
Alignment, API. 

I. Introduction 

The reason of Windows PE viruses are becoming more and 
more popular return to the ever-growing PE viruses, were are 
easy to propagate between different platforms and are difficult 
to detect by antivirus because of their portable file format. In 
addition, PE viruses have become the favorite target of most 
malware writers who exhibit their technique in the malware 
community. All these actions led to the development and 
upgrade of PE malwares, which make the antivirus more and 
more difficult to detect them [1]. 

Also the reason of a malware is growing rapidly belong to 
the number of malware applies various techniques to protect 
itself from the anti-virus solution detection. As a result, these 
many protection techniques are applied to a malware, a 
representative of those is a Packing. It is not an exaggeration 
that most of the malware currently is distributed. In other 
words, a packer is widely used for a malware protection. 
Therefore analysts must determine whether the malware was 
packed or not and if the malware is packed, what packer is 
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used, before an analysis of the malware. For these procedures, 
some packer detection tools were released and used [2]. 

There are relatively few mechanisms in existing computer 
systems which are analogous to the immune system. Anti-virus 
(AV) scanners primarily detect viruses by looking for simple 
virus signatures within the file being scanned. The signature of 
a virus is typically created by disassembling the virus into 
assembly code, analyzing it, and then selecting those sections 
of code that seem to be unique to virus. This approach can be 
easily subverted by simply changing the virus's code (and thus 
the virus signature) in trivial ways [3]. Most viruses in the wild 
today are of the "simple" type - not encrypted or polymorphic, 
and many of them have variants that come out afterwards. 
These variant are inherently similar to the original virus, yet 
current signatures fail to detect these variants without further 
updates from AV vendors. This indicates that present-day 
signatures are too weak to withstand simple changes to the 
virus body (i.e. dates, port numbers, variable names, etc) [3]. 
None of these systems, however, are anywhere as robust, 
general, or adaptive as the human immune system. 

To improve the performance a novel immune base approach 
for unknown Windows PE malwares detection is proposed, 
based on static analysis of PE executables files without needs 
to run and load them into memory. Another property for system 
is only depend on PE benign executables files at the beginning 
to gather information database. So the idea of approach 
opposite a challenge to separate between unseen benign 
executables files that enter to computer continuously and all 
unseen and unknown PE malwares. 

In immunology, there are two distinct viewpoints about the 
main goal of immune system; the classical self-non-self 
viewpoint states that immune system discriminates between 
self (human body cells and molecules) and non-self (other 
invading cells and molecules), and the danger theory viewpoint 
describes that the immune system looks for dangerous elements 
and events whether self or non-self [4]. 

In this paper the term suspicious means that it may be 
benign or malware. 

II. The dataset 

The IPEMDS only considers malware based on the PE 
format of Win32. So the specific training set consists of 300 
benign programs that were randomly gathered from the system 
files of windows XP operating system. There are also another 
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different 300 benign programs that make up the specific test 
set for unseen benign programs. 

The IPEMDS has used "VX Heavens Virus Collection' [5] 
database which is available for free download in the public 
domain. Malware samples, especially recent ones, are not 
easily available on the Internet. As mentioned earlier the 
IPEMDS only consider Win32 based malware in PE file 
format, so the IPEMDS has been tested on the three most 
popular malicious: 100 worms, 120 trojans, and 100 backdoors 
malwares collected from "VX Heavens Virus Collection" [5]. 

It is important to note that the IPEMDS is differentiating 
between packed and non-packed files and also it works 
regardless of the packed/non-packed nature of the file. 

III. Malwares and executable file Infectors 

The execution of such types of malwares is similar to the 
execution of any normal applications or programs that run 
under Windows OS. Malwares use many Windows functions 
stored in Kernel mode and user mode called Application 
Programming Interface (API). To call these functions, 
malwares should have the physical addresses of the needed 
APIs, which cannot be obtained easily, and which Windows 
OS will not simply provide. Thus, malwares find ways to 
collect these addresses from the Windows OS [6]. 

Malwares are programmed to know that each normal 
application that runs under Windows OS has a predefined list 
of API names and addresses. The listed API is imported by the 
application during execution or exported to other Windows 
applications. Malwares attack these PE applications to collect 
API addresses and control the execution of infected 
applications. They change certain fields and locations to direct 
the execution of the normal application PE to their codes, and 
then return the execution control to normal after performing 
their functionalities. They also modify the list of needed API 
functions to include other functions required during code 
execution [6]. 

IV. Static PE Analyzer 

The PE structure consists of headers and sections that 
explain the logical and physical information of file storage and 
execution, see figure 1. The physical part is called 'file 
header", which contains such information as number of 
sections and size of optional header. The logical part, known 
as "optional header', has information such as "relevant virtual 
address, file or section alignments, address of entry points", 
and many others. The third header, "section header", is also 
called "section table". It is a structure that contains 
information concerning the PE sections that follow this 
header. It is one of the important layers that scans for malware 
detection because each PE file is described in specific 
directory in the section header [6]. 

In general, sections are used to store data and codes of the 
file separately. Windows applications have nine predefined 
sections: .text, .bss, .rdata, .idata, .rsrc, .edata, .pdata, reloc, 
and .debug. Some applications may not need all of these 
sections, whereas others may require still more sections to suit 
their specific needs [6]. Codes and instructions of the PE file 
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are stored in the .text section, whereas data of the PE file are 
stored in .bss, .rdata, or .data, sections based on their types [6], 
The most important sections that malwares always scan 
are .edata and .idata. These sections contain information about 
the physical addresses of the Windows functions, which are 
called application programmable interface (API). The .edata 
section contains information about APIs that the file exports, 
whereas .idata features information about APIs that the file 
imports. The "Import Address Table" in the .idata is used by 
malware analysts to identify whether or not a PE file is 
infected [6]. 
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Fig. 1 : PE File Layouts on Disk and in RAM 

Inspired by the functioning of Major Histocompatibility 
Complex (MHC) in the human body, the static PE analyzer 
analyze PE behavior by observing which APIs use them when 
execute. 

In summary the implementation of our static PE analyzer 
involves extract the following information from the entered PE 
file without disassembling it: 

1) Verifying if the file is a valid PE file, from if PE signature 
"PE00" was exit; and compute how many PE signatures 
there are in current PE file, benign PE has only one PE 
signature. 

Extract from MS-DOS header: Magic number "MZ" 
which is a DOS exe signature, ejfanew which contain the 
offset of PE header. 

Examine how many DOS stub there are in current PE file, 
benign PE has only one DOS stub program; 
By ejfanew value it can be reach to PE header, and 
extract all its components, but the most important 
components the IPEMDS focus on are 

NumberOfSections, SizeOfOptionalHeader, 

Characteristics; 

Extract from Optional header: all its components, but the 
most important components the IPEMDS focus on are 
SizeOfCode, AddressOfEntryPoint, ImageBase, 

Section Alignment, File Alignment, SizeOflmage, 

SizeOfHeaders, Number OfRvaAndSizes. 
The value of NumberOfRvaAndSizes determine the 
number of Data Directories in the current PE file. So here 
the IPEMDS extract the array details of data directories 
which contain VirtualAddress, Size; for each one. 



2) 



3) 



4) 



5) 



6) 
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7) The value of NumberOfSections determine the number of 
sections in the current PE file and for each section there is 
it's section header. So the IPEMDS here extract the 
section headers, and there most important components are 
VirtualSize, VirtualAddress, SizeOfRawData, 
PointerToRawData, Characteristics. 

8) Find I AT, and extracting DLLs and API function names. 

9) The static PE analyzer extract 10 features put them in 
packed structure, to use it later to decide if this PE file is 
packed or not. 

10) The static PE analyzer extract 17 features put them in 
heuristic structure, to use it later as a tool aid to decide if 
this PE file is malware or not. 
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value used to avoid collision. For each new pair unseen before 
there is a new hash sequence value use instead of the pairs 
name, table (1) lists part of (DLL- APIs) pairs and its hash 
sequence values. 

In order to compute similarity, the direct comparison of two 
sequences is insufficient. So first the IPEMDS apply sequence 
alignment to the hash sequences and the IPEMDS goal is to 
find the longer two similar pieces from two sequences. 

Global alignment which aligns every element in every 
sequence, attempts to find the best possible alignment from the 
start to end of sequences. For example [7]: 
Sequence LFTAFFTL 
Sequence LFFTAVTL 



V. Portable Executable file Homeostasis (PeH) 

The static PE analyzer make a first step towards a 
homeostatic PE files, by gathering information about the APIs 
the benign windows PE files were used. It must be 
confirmation that homeostatic operation only done on 
windows benign PE files. 

The IPEMDS use sequences method, which mean record 
the APIs by using fixed window size in profiles, the better 
window sizes are 4 or 6. when input all dataset of benign PE 
files to static PE analyzer, the outcome is DLLs names and the 
APIs names used from them, here the PeH start to built a 
special six profiles as a database use them later in detection 
operation, these profiles are as follow: 

1) DLL&APInormal: for each PE file record number of 
DLLs and APIs used and there names. 

2) PeHSimilarityHashSeqNo: for each benign PE file record 
in one line the Hash sequence value for each pair (DLL- 
APIs) of it . the IPEMDS will use this profile in Global 
Alignment method describe later to find the similarity 
with other files. 

3) allAPIname&HashSeqNo: for all benign PE files records 
the (DLL-APIs) pair and its Hash sequence value. 

4) PeHSequences: for each PE file slide window on its APIs 
once for each step to produce sequence. For a window of 
size x, number of output sequences are: 

NoAPIseq = NoAPI - x + 1; (1) 

5) PackediDC: for each PE file collect 11 features and 
consider them as one immature Dendritic cell 
(PackediDC), and record it in current profile. 

6) HashiDC: for each PE file 

• Get the hash sequence for it; 

• Apply Global alignment between current PE file and 
hash sequence of all benign PE file dataset; 

• Compute 6 distinct similarity & differentially 
measures; 

• Record the measures values in Profile as one 
HashiDC. 

VI. Global Alignment 

Before describe Global Alignment it must clarify the 
meaning of Hash Sequence. While the outcome of static PE 
Analyzer are (DLL- APIs) pairs contain their names, the hash 



Global alignment: F-TAFFTL - Gap 
FFTA V-TL 

The selection of global and applied the Needleman-Wunch 
algorithm, is a general global alignment to hash sequences. 
The Needleman-Wunsch algorithm is in [7]. 

Table 1 ; (DLL- APIs) pairs and its hash sequence values. 



(DLL-APIs) pairs 


hash sequence 


kernel32.dll getmodulehandlew 


8 


kernel32.dll createfilew 


7 


kernel32.dll loadlibraryw 


9 


user32.dll messageboxw 


3 


user32.dll sendmessageA 


4 


kernel32.dll createfilew 


7 


user32.dll open 


6 


user32.dll messageboxw 


3 



VII. 



Similarity and Differentially Measures 



There exit many similarity and differentially measures for 
sequences. For greater efficiency, the selection done on six 
popular measures: four similarity measures and two difference 
measures. 

1) Cosine measure: it computes the angle between two 
sequences and captures a scale invariant according to the 
similarity [7]. 



ScosineiX, Y) 



X' Y 



^X T XY T Y 



(2) 



2) Extended Jaccard measure: is computed as the ratio of the 
number of shard attributes of X AND Y to the number of X 
ORY. [7] 



,rM,Y): 



X'Y 



X J X+Y T Y-X T Y 



■(3) 



3) Cosine-Jaccard average: also the similarity of two 
sequences is computed as [7]: 

SCos-Jac(X, Y) = SCosine(X, Y) + SJaccard(X, Y) .... (4) 

2 

4) R-Contiguous: The rcb matching rule, is defined as 
follows: If x and y are equal-length strings defined over a 



25 



http://sites.google.com/site/ijcsis/ 
ISSN 1947-5500 



finite alphabet, match(x, y) is true if x and y agree in at least r 
contiguous locations [8] [9]. but here R value not fixed but 
looking for the maximum matching between two sequences. 

5) Hamming distance: The Hamming distance between two 
strings is defined as the number of different characters 
between the two strings [10]. 

6) Euclidean distance: A Euclidean distance is defined as 

[9][11]: 

.(5) 



(IJCSIS) International Journal of Computer Science and Information Security, 

Vol 10, No. 2, February 2012 



d(x,y) = m*i-y$ =\\*-y\\ 



VIII. Packed executable classification 

A. PE Packers 

In general, runtime packers compress the original executable 
and attach an unpacking stub to it. Upon execution of the 
packed executable, the stub unpacks the original code (and 
data) and transfers control to it. PE Packers typically follow 
that scheme as well [12]. 

Generaly, a packed executable is built with two main parts 
during a two phase packing process. First, the original 
executable is compressed and stored in a packed executable as 
data. Second, a decompression module is added to the packed 
executable. The decompression module is used to restore the 
original executable [13]. 

B. Packing Detection 

Packed PE files were analyzed and it was found that nearly 
every type of packed PE file with common characteristics in 
the PE header that differ from the normal files which are not 
packed can be detected. For example, with a packed file, it is 
necessary to unpack the packed codes to execute the intended 
original codes. To unpack and rewrite the codes, the code 
section should contain both executable and writable attributes 
simultaneously. Typically, however, normal PE files do not 
contain sections of executable and writable attributes together 
[14]. 

IPEMDS classification approach's has a much better 
generalization ability than signature -based approaches and is 
able to distinguish between packed and non-packed 
executables with very low false positive and false negative 
rates. 

It use binary static analysis to extract information. And this 
information allows us to translate each executable into a 
pattern recognition receptors (PRRs) of one iDC. It then apply 
TLR algorithm to distinguish between packed and non-packed 
executables by using iDCs of them. 

In this IPEMDS, the encoded executable file detection 
technique utilizes these differences between the packed and 
normal files and entropy analysis for some parts of PE file. To 
present the different features of the packed and nonpacked of 
PE files effectively, the PRRs of iDC are defined, which 
consists of 1 1 PRR that can show these differences effectively. 
It use the TLR algorithm to classify a given PE file as 
"Packed" or "Non-Packed". It shows very good performance, 
as it checks only the selected 1 1 PRRs. 



C. Feature Extraction Module 

If a file is packed, some relationships between the attributes 
are broken. In this paper, this feature is utilized to detect 
packed PE files [14]. Detection technique utilizes the 
differences between the attributes of normal and packed files 
in the PE file header [14]. 
From PEfile to PRRs 

Here the feature extraction process were described, it use to 
translate a PE file into a packing signals list which will be 
encounter PRRs of iDCs. These eleven packing signals list 
are: 

• Number of Standard and Non Standard Sections. The 
PE file of non-packed applications usually contains a well 
defined set of standard sections. On the other hand, 
packed executables often contain code and data sections 
which do not follow these standard names. For example, 
the UPX executable packing tool 
(http://upx.sourceforge.net) usually creates PE files that 
contains two sections named .UPXO and .UPX1, 
respectively, and a section named .rsrc. The two sections 
.UPXO and .UPX1 are not standard and may be used to 
distinguish an executable packed using UPX from non- 
packed exectables. Besides UPX, a number of other 
packers usually generate PE files which contain code and 
data sections having non standard names. Therefore, 
counting how many standard and non standard section 
names are present in a PE file gives us a clue on whether 
the executable is packed or not [15]. 

• Number of Executable Sections. While analyzing the 
output of executable packing tools, we noticed that the PE 
file of some packed executables do not carry any 
executable section. Therefore counting the number of 
executable sections in the PE file helps in distinguishing 
between packed and non-packed executables [15] [14], if 
there is a not executable but a code section, then the 
IPEMDS can consider the executable code is modified. 

• Number of Readable/Writable/Executable Sections. 
The Packed file needs to include at least one section 
which is Readable/Writable/Executable at the same time, 
which means that a executable section could be modified 
during the running time. On the other hand, the executable 
sections (usually the .text section) in the PE file of non- 
packed applications do not need to be writable, and the 
Writable section flag is not set. Therefore, counting the 
number of sections which are writable and executable at 
the same time adds a piece of evidence to the conclusion 
whether the executable is packed [15] [14]. 

• Number of Entries in the IAT. Most non-packed 
executables import many external functions. On the other 
hand, packed executable often import very few external 
functions. The main reason is in that the unpacking 
routine does not need many external functions. The basic 
operations the unpacking routing performs are read and 
write memory locations in order to decrypt the code of the 
packed application on the fly. For example, no window on 
the screen or network operation is usually needed. This is 
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reflected in a small number of entries in the IAT of a 
packed executable [15]. 

• If the position of the PE signature is less than the size 
of IMAGE_DOS_HEADER. are about the size 
calculation and resizing problem of the created sections. 

• Looking on SpecialAPIs. PE packers typically remove 
most of the original import data as well and keep or add 
only a few imports, like LoadLibraryA, GetProc Address, 
and ExitProcess. 

• PE Header, Code, Data, and File Entropy. The 
encrypted code of an application P packed (i.e. hidden) 
into P ' is usually stored in a code or data section of the 
PE file. So we measure the byte entropy of the code and 
data sections in the PE file. If the entropy of a section is 
close to 8 bits, which is the maximum byte entropy, the 
section likely contains encrypted code [15]. 

There are parts of the PE header dedicated to optional 
fields that are not necessary for the correct loading of the 
program into memory by the operating system. Some 
packing tools may therefore hide encrypted code in those 
unused portions of the PE header. For this reason we 
measure the byte entropy of the PE header as well. 
Considering that the PE file is quite complex and contains 
other such unused spaces (for example, portions of the 
header of each section), the encrypted code may be hidden 
in several other locations. Therefore, we also measure the 
entropy of the PE file as a whole to take into account 
these cases [15]. 

Also Entropy analysis does not need signature of packer 
update which is a limitation of signature -based 
classification method [16]. By using the fact of measured 
entropy of compressed information is higher than of the 
original information [13]. Shannon's formula is devised to 
measure information entropy, as follows [13, 16]: 

H(x) = -2ni=lp(i).logbp(i), (6) 

where H(x) is the measured entropy value and p(i) is 
the probability of an ith unit of information in event x's 
series of n symbols. The base number of the logarithm can 
be any real number greater than 1. However, 2, 10, and 
Euler's number are chosen in general. 
The 1 1 PRRs extracted described are summarized in table (2). 

Table 2 : Summary of the feature extracted from PE file. 
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PRRs 


Range of Values 


1. 


Number of standard sections 


integer > 


2. 


Number of non-standard sections 


integer > 


3. 


Number of Executable sections 


integer > 


4. 


Number of 

Readable/Wri table/Executable 

sections 


integer > 


5. 


PEsig-less-DOSheader* 


[true, false] 


6. 


SpecialAPIs* 


Integer [1-3] 


7. 


Number of entries in the IAT* 


integer > 0, or -1 if 
the PE file has no 
IAT 


8. 


Entropy of the PE header 


[0-8] 


9. 


Entropy of the code sections 


[0-8], or- 1 if the PE 







file has no code 
section 


10. 


Entropy of the data sections 


[0-8], or -1 if the PE 
file has no data 
section 


11. 


Entropy of the entire PE file 


[0-8] 



D. Detecting Packing status by TLR Algorithm 

The name 'TLR' is in reference to Toll-like Receptors, 
which are biologically the membrane-bound proteins 
responsible for processing changes in PAMP concentration by 
DCs. The signals used in the TLR algorithm are binary 
signals, representing 'signal present' or 'signal not present', 
compiled during a short training period. A list of signal values 
is compiled during a training period, termed as the 'infectious 
signal list '. This list consists of discrete signal values which, 
when sensed by a DC, 'activate' the TLRs (i.e. sensors) on the 
DCs. The infectious signal list is initially generated to cover 
all values possible for the three signals [17]. 

The IPEMDS consider packing status as a Danger status 
detecting it by TLR algorithm, and it consider packing status 
as a Danger signal used it later for malware detection. As 
shown previously it became obviously how to collect 8 PRRs 
(not marked in table 2 ) for iDCs, where each PE file 
represented by one iDCs, and the new PE file represented by 
Ag has 1 1 PRRs. So the TLR algorithm will compare Ag with 
all iDCs and then decide it status (Packed or NotPacked). The 
TL R algorithm be as follow: 



Algorithm 1: TLR Algorithm for Packed Detection. 



Input: All benign PE files from PeH, 

New PE file wanted to detect it status (Packed or 
notPacked) 
Output: Packed or NotPacked 
For (each benign PE files in PeH) Do 

Extract the 8 PRRs; /* not marked in table 2 */ 
Create iDC with signals PRRs; 
Record iDC in PackediDC profile; 
End For 

Extract the 1 1 PRRs for the new PE file; 
Create Ag with signals PRRs; 
No-sDC = 0; 
No-mDC = 0; 
For (iDC in PackediDC) Do 

Compare Ag PRRs 's with iDCs PRRs 's; 
Update No-sDC; 
Update No-mDC; 
End For 

If (No -mDC > No -sDC) 
Print "Ag is Packed"; 
Else 

Print "Ag is NotPacked"; 



IX. Heuristic Analysis of 32-Bit Windows Malware 

The IPEMDS use several heuristic key technologies of 
Win32 malware as an second aid tool to detect a malware, 
which are as the following: 

A. The relocation module 
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In the normal program, the positions of variables in the 
memory are well calculated when compiled. The 
programmers do not need to relocate them. The variables are 
directly used by their names. However, for the virus 
programs, the locations of virus variables vary with the 
infected host programs. Different positions of the virus 
variables are the result of their attachment to different host 
programs when loaded in the memory with the host 
programs. Since these variables or constants do not have 
fixed addresses, the virus must rely on itself to relocate 
these addresses to normally access to the relevant resources 
when executed in the memory. Therefore, the Windows PE 
virus must have an inherent relocation module, which is 
usually at the beginning of the virus program with few codes 
and need little changes, so as to be executed in the Windows 
platform correctly, the Common code of relocation module 
is in [18, 1]. The reason why relocation module is chosen 
from between the others module because is usually at the 
beginning of virus source code, and always small and little 
changed code easy to extract [1]. 

B. The module of obtaining API address (IA T not in it Place) 

In general, normal programs have an import address table, 
where there are the actual addresses of API functions. Thus, 
when being called by the program, the corresponding API 
functions addresses can be found in the import address table of 
the Windows PE file. However, For Win32 PE virus, it has 
only one code section, which does not include the import 
address table so as to reduce the virus code. The Windows PE 
virus program cannot directly obtain the address of API 
functions, and must firstly identify these addresses in dynamic 
link library. Therefore, the Windows PE virus must have such 
module that can obtain the addresses of Windows API 
functions called by the virus [1, 18]. 

C. The module of searching target files 

PE viruses need to search target files continuously to spread 
themselves. Therefore, the PE viruses need a target files 
searching module [18, 1]. In the Win32 assembly, file- 
searching function is generally achieved through the Find 
First File, Find Next File API functions [18]. 

D. The module of mapping file to the memory 

Memory-mapping file provides a group of independent 
functions. The applications can directly read and write the file 
in disk by the pointers, instead of using normal I/O functions. 
Memory-mapping file typically improves I/O performance 
because it does not require copying data between buffers. The 
data in the file can be operated directly in the memory, thus, 
PE virus can quickly infect the target files, which can greatly 
improve the access speed, reduce the system resources 
occupied by the virus [1, 18]. The createfilemapping API 
function is use to memory mapping. 

E. Section Virtual Size is incorrect 

Some of malwares may infect sections without change there 
virtual size in Section_Header, or not rounded up to the closest 
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section alignment value. So it is suspicious enough to check 
sections's virtual size. 



F. Non standard NumberOfRvaAndSizes value 

NumberOfRvaAndSizes is a value in Optional_Header, 
which is the number of valid entries in the 
DATA_DIRECTORY array. This value is not fixed but it has 
maximum allowable value is 16. So it is suspicious enough to 
if it exceed 16, or it be zero to reduce file size. 

G. Suspicious Section Characteristics 

All sections have a characteristic that describes certain 
attributes and that holds a set of flags indicating the section's 
attributes. The code section has an executable flag but does 
not need writeable attributes because the data is separated. 
Very often the virus section does not have executable 
characteristics but has writeable only or both executable and 
writeable. Both of these cases must be considered suspicious. 
Some viruses fail to set the characteristic field and leave the 
field at 0. That is also suspicious [19]. The Characteristics 
value in Section_Header is a bunch of flags describing how 
the section's memory should be treated. So from this heuristic 
technique two features can be gotten: 

1) Writable executable Sections 

2) Suspicious Sections: If the characteristic field leave zero. 

H. Entry-Point Obscure 

Address of entry point, relative to image base, when 
executable file is loaded into memory. It is the value you need 
to add to the base address to get the linear address [20]. The 
Entry-Point address used by malware writers in several 
obscuring techniques to access malware's code, like selects 
position near to the original entry point of the application; 
therefore, the virus code will likely get control when the 
original application is executed [19]. So the IPEMDS check 
wither the Entry-Point address refer to the state of code section 
or not. 

I. Number of Non-Standard Sections 

Described earlier in Packed feature extraction module 
section. 

J. Possible Header Infection 

If the entry point of a PE program does not point into any of 
the sections but points to the area after the PE header and 
before the first section's raw data, then the PE file is probably 
infected with a header infector [19]. 

K. Renaming Existing Sections 

Some viruses change the section name to a random string. 
As a result, heuristic scanners cannot pinpoint the virus easily 
based on the section name and its characteristics [19]. So the 
IPEMDS check sections names with standard names. 

L. Import Address Table Is Patched 

If the import table of the application has GetProcAddress() 
and GetModuleHandleAQ API imports and imports these two 
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APIs by ordinal at the same time, then the import table is 
patched for sure. This is suspicious [19]. 

M. API String Usage and Suspicious KERNEL32.DLL Imports 

A very effective antiheuristic/antidisassembly trick appears in 
various modern viruses. An example is the W32/Dengue virus, 
which uses no API strings to access particular APIs from the 
Win32 set. Normally, an API import happens by using the 
name of the API, such as FindFirstFileAQ, OpenFileQ, 
ReadFile{), WriteFile{), and so on, used by many first- 
generation viruses. A set of suspicious API strings will appear 
in nonencrypted Win32 viruses [19]. 

The import table must be checked for a combination of API 
imports. If there are KERNEL32.DLL imports for a 
combination of GetModuleHandle(), SleepQ, FindFirstFileQ, 
FindNextFileQ, MoveFileQ, move(), GetWindowsDirectoryQ, 
WinExecQ, DeleteFileQ, WriteFileQ, 

CreateFile(),CreateFileA(), CreateProcessQ, deletefdeQ, 
createprocessQ, *.EXE, readprocessmemoryQ, 

writeprocessmemoryQ, virtualallocexQ . 

The *.EXE string, as well as almost a dozen APIs that search 
for files and make file modifications. This can make the 
disassembly of the virus much easier and is potentially useful 
for heuristic scanning [19]. 

N. Multiple MS-Stub 

IPEMDS note that several PE malwares have several 
MS-Stub, where benign PE should only has one. So it 
suspicious enough to count them. 

O. Multiple PE Headers 

When a PE application has more than one PE header, the 
file must be considered suspicious because the PE header 
contains many nonused or constant fields. This is the case if 
the e-ifanew field points to the second half of the program and 
it is possible to find another PE header near the beginning of 
the file [19], or in the case where PE signature is duplicate 
more than one time. 

P. heuristic count 

All the previous heuristic features are summed in heuristic 
count to use it as an aid tool in HashScan and TLR (First line 
of Defense). 

X. Hash-Scan and TLR (First line of Defense) 

All the previous explained techniques and algorithms, and 
there gathered information like: build PeH, HashiDC profile, 
PackediDC profile, heuristic count; are used here in a special 
detection techniques called Hash-Scan and TLR, it can be 
summarized in the following steps, for each input files: 

1) Get the Packed status for the current PE file. 

2) Get heuristic count for the current PE file. 

3) Read the allAPIname&HashSeqNo profile which has all 
the (DLL- APIs) pairs and there Hash sequence values of 
the PeH. 

4) Extract (DLL-APIs) pairs, and find the Hash sequence 
values for them with take into account the Hash sequence 
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values of the PeH, if the new pair in them, it must has the 
same value, else it must has a new no repeated value 

5) Read the PeHSimilarityHashSeqNo profile which has the 
Hash sequence values for each benign file alone. 

6) Apply the Global alignment between the new file and all 
benign PE files in the PeH. 

7) Apply similarity and differentially measures on the strings 
result from step 4, 

8) Create a Hash Antigen (HashAg) for the current PE file 
contain the following: 

a) Maximum Euclidian distance value. 

b) Maximum Hamming distance value. 

c) Minimum Cosine measure value 

d) Minimum Extended Jaccard measure value 

e) Minimum Cosine-Jaccard average value 

f) Minimum R-Contiguous value. 

g) Packed status. 
h) Heuristic count. 

9) Get the danger for the current HashAg by using TLR 
algorithm, its inputs are HashAg and HashiDC profile. As 
the TLR algorithm used in Packed detection. 

XL APIs Sequences Scan & DCAs (Second line of 
Defense) 

In this line the IPEMDS emphasis on a type of matching of 
APIs sequences (its length depend on window size parameter) 
between Suspicious file and PeH. So this line do several 
comparisons and the arbitrators on comparisons results are 
three algorithms: cDCA, dDCAl, and our proposed dDCA2. 

A. APIs Sequences Scan 

When PeH was built up, the PeHsequences profile is 
composed from APIs sequences by sliding window on the 
APIs of each benign PE file in PeH. So the new suspicious file 
also its APIs will organized in sequences by the same window 
size. The results in this step for each new suspicious file are 
MaxMatchSeq, MaxNotMatchSeq, MaxMatchDLL, 

MaxNotMatchDLL, MaxMatchAPI, and MaxNotMatchAPI. 
These value with heuristic and packed status are combine in a 
special way to be used as a four signals of DCAs. 

B. classical Dendritic Cell Algorithm (cDCA) 

The purpose of a DC algorithm is to correlate disparate 
data-streams in the form of antigen and signals. The DCA is 
not a classification algorithm, but shares properties with 
certain filtering techniques. It provides information 
representing how anomalous a group of antigen is, not simply 
if a data item is anomalous or not. This is achieved through the 
generation of an anomaly coefficient value, termed the 
"mature context antigen value" (MCAV). The labeling of 
antigen data with a MCAV coefficient is performed through 
correlating a time-series of input signals with a group of 
antigen. The signal categorization is based on the four signal 
model, based on PAMP, danger, safe signals, and 
inflammation. The co-occurrence of antigen and high/low 
signal values forms the basis of categorization for the antigen 
data [21] [22]. 
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The DCA is a population based algorithm, with the 
population consisting of a set of interacting objects, each 
representing one cell. Each DCs process input signals to form 
a set of cumulatively updated output signals in addition to the 
collection of antigen throughout the duration of the sampling 
stage. Each DC can exist in one of three states(immature, 
semi-mature and mature) at any point in time. However, the 
differences in the semi-mature and mature state is controlled 
by a single variable, determined by the relative differences 
between two output signals produced by the DCs. The 
initiation of the state change from immature to either mature 
or semi-mature is facilitated not by the collection of antigen, 
but by sufficient exposure to signals. This exposure is limited 
by the assigned "migration threshold" [23][24][21][22]. 

IPEMDS use cDCA with Antigen Multiplier in order to 
assess the type of an antigen, it would be presented multiple 
times, each time to a different iDC, so that MCAV value can 
be generated for it depend on different iDC, see algorithm 2. 
The general form of the signal processing equation is shown in 
equation (1) [21] [22]: 

Output = (S (P„ * P w ) + Z (D„ * D w ) + I (S„ * S w )) 

*d+/); (7) 

where P w are the PAMP related weights, D w for danger signals 
etc. 



Algorithm 2: cDCA Algorithm for Malware Detection. 



Input: Ag with four signals (PAMP, DS, SS, Infsig), 
Output: Benign or Malware, 

Initialize: AgMultiplier, PopDCsize, iDCLife, cDCAthreshold 
For (i to AgMultiplier) Do 

Copy Ag; 
End For 
For {iDC in PopDCsize) Do /* Initialize iDC*/ 

Initialize iDC: LifeSpan, CSM, semimature, mature, storeAg 
Random MigrationThreshold; 
End For 
While {AgMultiplier) Do 

For (iDC in PopDCsize) Do 

While {CSM output signal < migration Threshold) Do 
get antigen; 
store antigen; 
get signals; 

calculate interim output signals; 
update cumulative output signals; 
End While 

cell location update to lymph node; 
If (semi-mature output > mature output) Then 

cell context is assigned as 0; 
Else 

cell context is assigned as 1; 
End For 

Get MCAV for current Ag; 
End While 
Get MCAV mean; 
If (MCAVmean > cDCAthreshold) Then 

Print "Malware PE"; 
Else 

Print "Benign PE"; 
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C. deterministic Dendritic Cell Algorithm (dDCAl & 
dDCA2) 

A simplified and more predictable version of DCA which is 
called deterministic DCA (dDCA). Since its original 
inception, two major improvements are proposed for DCA 
namely antigen multiplier and time-windows for the purpose 
of optimization, they having the same effect on the DCA. 
[25] [26]. The new variation of DCA, called dDCA, has 
following enhanced features: 

• Three input signal categories are reduced to two, i.e. 
danger and safe signal; 

• Random migration threshold is replaced with uniform 
distribution of lifespan values in a population; 

• Dedicated storage and sampling of antigens is replaced 
with sampling of all antigens by DCs; 

• Instead of forming a sampling pool, the signals' data is 
processed by all DCs. As a result, output signals are 
calculated once for population of DCs; 

• Only one factor (K) is calculated for each DC to arrive at 
a context. Negative values of K reflect a benign context 
and positive values indicate a malicious context. 

Signal processing is simplified by reducing the number of 
input signals and using a weight assigning scheme. Two 
outputs are calculated: 

(1) accumulation of signals (CSM), 

(2) score (K), to which the threshold is applied for 
classification. 

The new signal processing procedure is shown in Equations 8 
and 9, where S and D is the input value for the safe and danger 
[27]. 

CSM = S + D (8) 
#=D-2S (9) 
IPEMDS use dDCA with changes suit its application and 
called it here dDCAl, and present another dDCA called it 
dDCA2 differ from the first one in the place where to count 
number of mature DC and the no need to store Ags and count 
them later, so MCAV will differ in the method of its 
calculation. The dDCA2 present promising results as will be 
see later. The two algorithm state in Algorithm (3) with 
markers determine which steps used in dDCAl and which in 
dDCA2 to simplify comparison between them. 



Algorithm 3: dDCAl & dDCA2 Algorithm for Malware 
Detection. 



Input: Ag with two signals (DS, SS), 

Output: Benign or Malware, 

Initialize: AgMultiplier, PopDCsize, iDCLifespan, dDCAl threshold 

For (i to AgMultiplier) Do 

Copy Ag; 
End For 
For (iDC in PopDCsize) Do /* Initialize iDC*/ 

Initialize iDC: RandomLifeSpan, CSM, K, storeAg; 
End For 
While (AgMultiplier) Do 

Get CSM; 

Get if; 

For (iDC in PopDCsize) Do 
While (iDCLifespan > 0) Do 
get antigen; 
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store antigen; /* dDCAl*/ 
Get iDC.K; 
iDCLifespan — ; 
End While 

cell location update to lymph node; 
If (iDC.K < 0) Then 

cell context is assigned as 0; 
Else 

cell context is assigned as 1; 
Count no. of Mature cell; /* dDCAl*/ 
Count no. of Stored Ag; /* dDCAl*/ 
End For 

Get MCAV for current Ag; /* dDCAl*/ 
Count no. of Mature cell; /* dDCA2*/ 
End While 
Get MCAV mean; 
If (MCAVmean > dDCAl threshold) Then 

Print "Malware PE"; 
Else 

Print "Benign PE"; 



The steps of APIs Sequences Scan & DCAs (Second line of 
Defense) can be summarized by the following : 
1) Get the Packed status for the current suspicious PE file. 
Get heuristic count for the current suspicious PE file. 
Read PeHsequences profile. 
Read suspicious file's DLLs APIs. 

Find Maximum match for APIs and DLLs between 
current suspicious PE file and any one of Benign PE files 
in PeH. 

If found in step 5, calculate and record the following: 
MaxMatchSeq, MaxNotMatchSeq, MaxMatchDLL, 
MaxNotMatchDLL, MaxMatchAPI, and 

MaxNotMatchAPI. 

Create a danger Antigen and set it signals as follow: 
a. Ag.name = suspicious file name; 

Ag.PAMP = heuristcount + PackedPE + 

MaxNotMatchDLL; 

Ag.DS = (MaxNotMatchSeq + MaxNotMatchAPI)/2; 

Ag.SS = {MaxMatchSeq + MaxMatchAPI + 

MaxMatchDLL) I 3; 

e. Ag.InfSig = heuristcount + PackedPE; 

f. Ag.MCAV = 0; 

Get Ag.MCAV = cDCA(Ag); Get Ag.MCAV = 
dDCAl(Ag); Get Ag.MCAV = dDCA2(Ag); 
The final decision is what two of the three algorithms 
agreement on it Benign or Malware. 



2) 
3) 
4) 

5) 



6) 



7) 
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• In comparison to the small knowledge that the IPEMDS has, 
it obtain high Detection rate and very low false alarm, and 
this performance is promising to be better. 

• No need to training period, it only extract some special 
information from a finite number of Benign PE executable 
files. 

• It depend on Danger theory which is a second generation of 
Immune System theories to form two layers of defense. 

• The speed of the system to detect is acceptable, in 
comparison with common Antivirus. 

• The system permit to delete all PeH contents to built another 
new one, this feature benefit in case of install a new Benign 
executable files to operating system, although it is unlikely 
that the IPEMDS will detect it as Malware. 

• The number of Benign executable files selected to built a 
PeH are incomparable to the large number of Benign 
executable files in personal computer system. Here the 
selection done on 300 from 5228 Benign files. 

• The experimental results in next section show the important 
of the two lines in IPEMDS, this fact return to sensitivity of 
first line to recognize new Benign files where the second 
line recognize the Malware. So gathering them together give 
us the optimal results wish high detection rate (0.98) and 
low false alarm rate (0.1 1). 

• The IPEMDS implemented using C# language. 



b. 

c. 
d. 



9) 



XII. 



IPEMDS Properties 



Figure 2 shows the overall diagram of IPEMDS. The special 
properties of IPEMDS are: 

• It only depend on Benign PE executable files to built its 
knowledge as PE Homeostatic (PeH), and use it to diagnose 
whether any new PE executable is Benign or Malware. 

• The performance can be improved more by careful selection 
of Benign PE executable files varied. 

• It characterizes by the flexibility, because of it can detect 
any type of Win32 Malwares. 



XIII. 



Experimental Results 



The IPEMDS depend on the standard performance 
measures: Detection Rate (TPR) and False Alarm Rate (TNR). 

Several series of experiments are done to test IPEMDS 
performance, as follow: 

1) Implement the IPEMDS on Malware dataset to compute 
the Detection Rate for each line alone and for both lines 
represented by the IPEMDS as shown in table 3. Note that 
each set of malwares have different types belong to the 
same malware class, for example Trojan contain: Agent, 
AddShare, AddUser, Adex, Adut, Affc, Adder, ect. 

2) Implement the IPEMDS on new Benign dataset to 
compute the False Alarm Rate as shown in table 4. 

3) Table 5, and figure 4 show a comparison between the 
used algorithms in the number of malwares they can 
detect. 




Heuristic Analysis Packing Detection Hash Sequence 



First— Line of defense 
(Hash-Scan & TLR) 

- Global Alignment 

- Similarity and 
Differentially 
Measures 



I 
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Table 3 : Show the Detection Rate of each Line alone and for the all 

IPEMDS. 



Malware-Name 


Size 


TPRof 
First-Line 


TPRof 
Second-Line 


TPRof all 
IPEMDS 


Backdoor-setl 


50 


0.48 


1 


1 


Backdoor-set2 


50 


0.3 


0.98 


0.98 


Worm.Bagle 


50 


0.88 


0.92 


1 


Worm.Mydoom 


51 


0.76 


1 


1 


Trojan-set 1 


60 


0.48 


0.98 


0.97 


Trojan-set2 


60 


0.27 


0.93 


0.93 


TPR-Average 


321 


0.53 


0.968 


0.98 



Table 4: Show the False Alarm Rate of each Line alone and for 
the all IPEMDS. 



Benign-sets 


Size 


TNRof 
First-Line 


TNRof 
Second-Line 


TNRof all 
IPEMDS 


1 


50 


0.08 


0.68 


0.08 


2 


50 


0.08 


0.78 


0.08 


3 


50 


0.16 


0.78 


0.16 


4 


50 


0.18 


0.72 


0.18 


5 


50 


0.12 


0.72 


0.1 


6 


50 


0.06 


0.64 


0.06 


Average 


300 


0.113 


0.72 


0.11 



Table 5: Show number of mal wares detected by each algorithm 
alone. 



Malware-Name 


Size 


TLR 


cDCA 


dDCAl 


dDCA2 


Backdoor-setl 


50 


24 


27 


49 


50 


Backdoor-set2 


50 


14 


23 


48 


48 


Worm.Bagle 


50 


44 


21 


46 


50 


Worm.Mydoom 


51 


38 


18 


50 


50 


Trojan-set 1 


60 


29 


31 


58 


60 


Trojan-set2 


60 


14 


35 


54 


54 


Sum 


321 


136 


155 


305 


312 




Fig 4: Detection Malware curve comparing for four algorithm 
(TLR, cDCA. dDCAl, dDCA2) 
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Abstract: 

Image Compression is just used 
everywhere. All the images you get on the 
Internet and web pages are compressed, 
typically in the JPEG or GIF formats, most 
modems use compression, HDTV will be 
compressed using MPEG-2 and several file 
systems automatically compress files when 
stored and the rest of us do it by hand. 
Compression is that the algorithms used in the 
real world make heavy use of a wide set of 
algorithmic tools, hash tables, including sorting, 
dictionary and FFTs. Furthermore, algorithms 
with strong theoretical foundations and 
implementation play a critical role in real-time 
applications because many of compression 
algorithmic models available, where in some of 
the algorithms will only be standing in the field. 
In this paper we are concentrating about zero 
tree wavelet method and proposed an enhancing 
approach in zero tree wavelet method. 

Keywords: 

Introduction: 

Computer graphics applications, 
particularly those generating digital photographs 
and other complex color images, can generate 
very large file sizes. Issues of storage space, and 
the need to rapidly transmit image data across 
networks and over the Internet, has led to the 
development of a range of image compression 
techniques in order to reduce the physical size of 
files. Every compression techniques are 
independent of specific file formats, indeed, 
many formats support a number of several 
compression types. They are an essential part of 
digital image creation, transmission and storage 



Most algorithms are particularly suited to 
specific environment and these will be 
understood if they are used effectively. For 
example, is more efficient at compressing 
monochrome images, whilst others yield better 
results with complex color images. 

Image compression algorithms fall into two 
main categories: 

• Lossy compression, which achieves its 

effect at the cost of a loss in image 
quality by removing some image 
information. 

• Lossless compression techniques, which 

reduce size whilst preserving all of the 
original image information and without 
degrading the quality of the image. 
Lossy compression techniques should be 
treated with caution. If images are repeatedly 
migrated over time between different lossy 
formats, the image quality will be increasingly 
degraded at each stage. However, in some 
circumstances the use of lossy compression may 
be required, for example, to enable very large 
volumes of high-quality color images to be 
managed economically. In such circumstances, 
visually-lossless compression should be used, 
which only removes image information which is 
invisible to the human eye at normal 
magnification. 

Some compression algorithms are patented and 
may only be used under license. Others have 
been developed as open standards. This can be 
an important consideration in terms of both 
creation costs and long-term sustainability. 

Earlier works: 

In our earlier work we worked with the 
RLE algorithm. RLE is a very simple form of 
data compression in sequences in which the 
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same data value occurs in many consecutive data 
elements that are stored as a single data value 
and counted, rather than as the original run. This 
is useful on data that contains many such runs, 
for example, relatively simple graphic images 
such as icons, line drawings, and animations. It 
is not useful with files that don't have many runs 
as it could potentially file size is 
increase.Normally encoding algorithms are very 
complex to understand. There are numbers of 
algorithms available for encoding the schemes. 
Among that run length encoding algorithm is 
easy one to understand there are varieties of 
methods available to encode the long runs. The 
following diagram shows these methods. In our 
earlier example we gave text based. In this paper 
we are going to implement this technique into 
the image. Run length algorithm produce better 
result in text but not all type of images. Now we 
are trying to incorporate this technique into RGB 
color image. RGB color image always have 
three layers like Red, Green, Blue. In every layer 
have its own byte values. If the color of the 
image or color of the pixel values same with the 
neighbor pixel then run length will same. Now 
we can apply run length algorithm into the 
image. 

Algorithm Procedure: 



binary form for our understanding we convert 

into integers then apply our RLE schemes. 

Now we take a sample byte input of the images 

like 

77 77 77 87 87 87 87 22 22 22 11 11 44 44 44 

44 44 65 65 

After encoding this series converted into like 

this 

377,487,322,211,544,265 

If we store this value in direct manner 
inside the pixel it will take long bytes. To 
overcome this problem we go for data structure 
methods index table data dictionary and linked 
list. We just assign the alpha index to every runs. 
For example 
M-377 
N-487 
0-322 
P-211 
Q-544 
R-265 

After assign the alpha index we have to form the 
linked list data structure for the pixel. In this 
there are four fields 

1 . Alpha Index 

2. Data dictionary 

3. Header 

4. Link 



1 . Input Image 

2. Layer Separation 

3. RLE encoding apply into byte level 

4. Assign alpha index value in run length. 

5. Link list data structure for organizing 
the RGB. 

6. Store data structure top of image 

7. Decoding. 

In step one input image will take for the image 
compression. Even the image itself there are 
variety of forms available. Like plain image 
patterned image highly patterned image etc. if 
the image is plain then RLE works much 
efficient and good. In case the image is patterned 
every compression algorithm works slight dull. 
Then input image will transformed into 
collection of pixels. In every pixel have three 
layers like red, green, blue. In every layer having 
color ratio between to 255. Actually it will in 



Linked list is stored in the top of picture so 
that we can retrieve the needed information or 
part of information in an effective manner. We 
have compared EARLE with many compression 
algorithms. The result analysis shows that 
EARLE performed good, compared with other 
algorithms. In appendixl and appendix2 Sample 
Images and comparison ratio will be explained, 
(refer the Appendix column) 

Decompression is always the reverse process of 
encoding. First take the linked list index it will 
give the alpha index value from which we can 
get the run lengths. Using header field we can 
get the basic information of the pixel. In data 
dictionary we get more information about the 
pixels. We can also access the part of picture 
information 
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Proposed Method (NAGA): 

The embedded zerotree wavelet 
algorithms (EZW) is a simple, yet remarkable 
significant image compression algorithm, This 
algorithm mainly focuses mainly on bits. In the 
bit, streams are generated in order of importance 
and some mathematical calculation which gives 
fully embedded code. Using an embedded 
coding algorithm, an encoder can terminate the 
encoding at any point thereby allowing a target 
rate or target range metric did not met. From the 
given bit stream, the decoder can start the 
decoding process at any point in the bit stream 
and can produce exactly the same image that 
would have been encoded at the bit rate 
corresponding to the truncated stream. In 
addition to producing a fully embedded bit 
stream, EZW consistently produces compression 
results that are competitive with virtually all 
known compression algorithms. 

Algorithm: 

• Image Read 

• Gray Scale conversion 

• Wavelet transformations apply. 

• Find the coefficients in 

dominant and sub ordinate 
pass. 

• Form matrix using coefficients 

• Allocate the priority index 

values based on the threshold 
(T) value. 

• Update the data dictionary 

• Stop when final threshold reach 

• Compression achieved 

• Decoding. 



A wavelet coefficient C is said to be 
inconsistent with respect to a given threshold T 
if ICI<r. The zero tree is based on the hypothesis 
that if a wavelet coefficient is inconsistent with 
respect to the given threshold, then all wavelet 
coefficients of the same direction or orientation 
in the same spatial location at the finer scale are 
said to be insignificant with respect to the same 
threshold (T). More specifically, in a 



hierarchical sub band system with the exception 
of the highest frequency sub bands, ever 
coefficient at a given scale can be related to a set 
of coefficients at the next finer scale of similar 
orientation. The coefficient at the range is called 
the parent and all coefficients corresponding to 
the same spatial location at the next finer scale 
of similar orientation are called children. 
Similar, we can define the concepts descendants 
and ancestors 

Given a threshold T to determine whether or not 
a coefficient is significant, a coefficient x is said 
to be an element of a zero treefor the threshold T 
if itself and all of its descendents are 
insignificant with respect to the threshold T. 
Therefore, given a threshold, any wavelet 
coefficient could be represented in one of the 
four data types: zero tree root (ZRT), isolated 
zero (IZ), positive significant (POS) and 
negative significant (NEG). 

The dominant pass finds out the pixel 
values above a certain threshold and the 
subordinate pass quantizes all significant pixel 
values found in this and all previous dominant 
passes previous. A dominant pass ensures all 
trees for significant pixel values with respect to 
a certain threshold. The initial threshold is 
chosen to be one-half of the maximum 
magnitude of all pixel values. Consequent 
dominant pass thresholds are always one -half 
of the previous pass threshold. 

When an insignificant pixel value is 
found, and a check of all it's children reveals that 
they too are insignificant, then it is possible to 
encode that pixel and all it's children with one 
symbol, a zero tree root, in place of a symbol for 
that pixel and a symbol for each of that pixel's 
children, thus achieving compression. Pixel 
values found to be significant in the dominant 
pass are encoded with the symbol positive, for a 
value greater than zero, or negative, for a value 
less than zero, then those pixel values are added 
to a subordinate list for quantization, and the 
pixel value in the sub band is then set to zero for 
the next dominant pass. Pixel values are found to 
be insignificant in the dominant pass but with 
significant children are coded as isolated zeros. 
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So, the dominant passes map pixel values to a 
four symbol alphabet which can then be further 
encoded by using an adaptive arithmetic coder. 





Figure: Coefficients are 
coded in a zerotree 
structure and scanned 
in a left-to-right order. 



After each dominant pass, a subordinate 
pass is then performed on the subordinate list 
which contains all pixel values previously found 
to be significant. In the subordinate pass we are 
introducing the new label allocation scheme 
based on the threshold convergence. For 
example if the subordinate phase meets the 
nearest value of threshold value then we can 
allocate the higher priority of the symbol or else 
we can allocate some lower value symbol based 
on the degradation of threshold value.Since the 
initial threshold is one-half of the maximum 



magnitude of all pixel values for the first 
dominant pass then in the first subordinate pass 
only two ranges are specified in which a 
significant pixel value could lie: the upper half 
of the range between the maximum pixel value 
and the initial threshold. 

A pixel value in the upper half of the 
range gets coded with the high priority symbol 
like a pixel value in the lower half gets coded 
with the low priority symbol. A pixel value 
found to be in a particular range is quantized 
from the decoder's viewpoint, to the middle 
point of that range. Upon subsequent 
subordinate passes the threshold has been cut in 
half and so there are twice as many ranges as the 
last subordinate pass plus two new ranges 
corresponding to the new lower threshold. In 
every cutting we will allocate the new priority 
symbols based on the threshold convergence we 
will allocate the new symbols. Allocating the 
new symbols doesn't matter but we have to 
maintain the data dictionary about the coding. 
That should be available in the starting pixels of 
the image. If the group of coded letter available 
then decoder will identify the some compression 
scheme present there. By reading the 
subordinate symbol from the dictionary 
corresponding to a significant pixel and knowing 
the threshold, the decoder is able to determine 
the range in which the pixel lies and reconstructs 
the pixel value to the midpoint of that range. 

Conclusion: 

In this paper we have provided an 
enhanced approach of zero tree wavelet method 
using priority index method. In our earlier work 
we have taken run length encoding scheme for 
our implementation. We have taken comparative 
analysis over EARLE work and the result is 
positive manner. We made this effort which 
will be useful for the people in their further 
research. 
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Appendix: II 



File 
Name 


Size 


TIFF 


GIF 


EARLE 


NAGA 


Stars 


2232 kb 


74% 


73% 


80% 


84% 


Design 


1232kb 


71% 


75% 


83% 


87% 


Nature 


80kb 


65% 


81% 


81% 


84% 
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Abstract- This research work is motivated by the 

need to achieve low latency in an input-queued centrally- 
scheduled cell switch for high-performance computing 
applications; specifically, the aim is to reduce the latency 
incurred between a request and response arrival of the 
corresponding grant. The minimum latency in switches with 
centralized scheduling comprises two components, namely, 
the control-path latency and the data-path latency, which in 
a practical high-capacity, distributed switch implementation 
can be far greater than the cell duration. We introduce a 
speculative transmission scheme to significantly reduce the 
average control-path latency by allowing cells to proceed 
without waiting for a grant, under certain conditions. It 
operates in conjunction with any centralized matching 
algorithm to achieve a high maximum utilization. Using this 
model, performance measures such as the mean delay and 
the rate of successful speculative transmissions are derived. 



II.SPECULATIVE TRANSMISSION SCHEMEARCHITECTURE: 

Our objective is to eliminate the control-path latency in the 
absence of contention. To this end, we introduce a 
speculative transmission (STX) scheme. The principle behind 
STX is related to that of the original ALOHA and Ethernet 
protocols: Senders compete for a resource without perior 
scheduling. If there is a collision, the losing sender(s) must 
retry their data transmissions in a different time slot. 
However, the efficiency of ALOHA-like protocols is very poor 
(18.4% for pure ALOHA and 36.8% for slotted ALOHA because 
under heavy load many collisions occur, reducing the 
effective throughput. Therefore, we propose a novel method 
to combine scheduled and speculative (non-scheduled) 
transmissions in a crossbar switch. The objective is to achieve 
reduced latency at low utilization owing to the speculative 
mode of operation and achieve high maximum throughput 
owing to the scheduled mode of operation. 



Keywords: speculative transmissions (STX), collisions, 
crossbar switch, cache table, control-path latency. 

I. Introduction 

A KEY component of massively parallel computing 
systems is the interconnection network (ICTN). To achieve a 
good system balance between computation and 
Communication, the ICTN must provide low latency, high 
bandwidth, low error rates, and scalability to high node 
counts (thousands), with low latency being the most 
important requirement. Although optics holds a strong 
promise towards fulfilling these requirements, a number of 
technical and economic challenges remain. Corning Inc. and 
IBM are jointly developing a demonstrator system to solve 
the technical issues and map a path towards 
commercialization. 



Crossbar switch area, latency, and power is computed using 
manual floor planning, standard cell selection, and wiring 
estimation. Taking into account output driver capacitance 
and switch wiring capacitance by annotating specific nets, 
virology synthesis is then performed for the remaining router 
logic structures. Estimated frequency parameters come from 
this virology delay coupled with timing models for the other 
structures. Finally, with energy and delay models for all 
router components, cycle-accurate C++ simulation model is 
complemented with necessary event counters to form an 
accurate power model. 

The main contribution of this work is a hybrid crossbar 
scheduling scheme that combines scheduled and speculative 
modes of operation, such that at low utilization most cells 
can precede speculatively without waiting for a grant, thus 
achieving a latency reduction of up to 50%. In contrast, on 
demand protocols attempt to discover a route only when a 
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route is needed. To reduce the overhead and the latency of Using stale routes causes packet losses, and increases 
initiating a route discovery for each packet, on-demand latency and overhead. We investigate how to make on- 
routing protocols use route Caches. Due to mobility, cached demand routing protocols adapt quickly to topology changes, 
routes easily become stale. This problem is important because such protocols use route 

caches to make routing decisions; it is challenging because 
topology changes are frequent. 





Performance of a Speculative Transmission Scheme for Scheduling-Latency Reduction 
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Fig.l Speculative Transmission Scheme for Scheduling-Latency Reduction 



We propose proactively disseminating the broken link increases as new routes are discovered and decreases as 



information to the nodes that have that link in their caches. 
Proactive cache updating is key to making route caches adapt 
quickly to topology changes. It is also important to inform 
only the nodes that have cached a broken link to avoid 
unnecessary overhead. Thus, when a link failure is detected, 
our goal is to notify all reachable nodes that have cached the 
link about the link failure. 



stale routes are removed. 



There are four fields in a cache table entry: 

1. Route: It stores the links starting from the current node to 
a destination or from a source to a destination. 



.CACHE TABLE 



2. Source Destination: It is the source and destination pair. 



It was shown that no single cache size provides the best 3. Da ta Packets: It records whether the current node has 
performance for all mobility scenarios. Thus, we can design a forwarded 0, 1, or 2 data packets. It is initially, incremented 
cache table that has no capacity limit. The cache size 
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to 1 when the node forwards the first data packet, and 
incremented to 2 when it forwards the second data packet. 

4. Reply Record: This field may contain multiple entries and 
has no capacity limit. A Reply Record entry has two fields: the 
neighbor to which a ROUTE REPLY is forwarded and the route 
starting from the current node to a destination. 



Route 


S-D 


DP 


ReplyRecord 


CDE 


AE 





3^ CDE 



Fig 2- Example of Cache Table 



REQUESTS with cached routes, stale routes may be quickly 
propagated to the caches of other nodes. Thus, pre-active 
and post-active routes are important sources of cache 
staleness. 



1) Detailed Description: The algorithm starts either when a 
node detects a link failure or when It receives a notification. 
In either case, the algorithm generates a notification list, 
which is a list of neighborhood nodes that need to be 
notified. Each entry in this list includes a node and a cached 
route to reach that node. A notification will be sent as a 
ROUTE ERROR.When a node detects a link failure; the 
algorithm checks each entry of the node's cache. If a route 
contains a forward link; 



A. DETAILS OF ALGORITHM 

The Distributed Cache Update Algorithm 

We present the distributed cache update algorithm. We 
define a broken link as forward or backward link. A broken 
link is a forward link for a route if the flow using the Route 
crosses the link in the same direction as the flow detecting 
the link failure; otherwise, it is aSoc/cw/orc/link. For these two 
types of links, the operation of the algorithm is symmetric. 
On-demand Route Maintenance results in delayed awareness 
of mobility, because a node is not notified when a cached 
route breaks until it uses the route to send packets. 



The algorithm does the following steps: 

1. If Data Packets is 0, indicating that the route is pre-active, 
then no downstream node needs to be notified because the 
downstream nodes did not cache the link when forwarding a 
ROUTE REPLY. 

2. If DataPacketsis 1 or 2, then the upstream nodes need to 
be notified, because at least one data packet has reached 
the node and hence the Upstream nodes have cached the 
broken link. The Algorithm adds the upstream neighbor to 
the notification list. 



We classify a cached route into three types: 

1. pre-active, if a route has not been used; 

2. active, if a route is being used; 

3. Post-active, if a route was used before but now is 
not. 



3. If DataPacketsis 2, or if DataPacketsis 1 and the route 
being examined is different from the source route in 
the packet, then the downstream nodes need to be 
notified, because at least one data packet has traversed 
the route and hence the downstream nodes have 
cached the link. 



It is not necessary to detect whether a route is active or post- 
active, but these terms help clarify the cache staleness issue. 
Stale pre-active and post-active routes will not be detected 
until they are used. Due to the use of responding to ROUTE 
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Fig 3 - Example of Distributed Cache Updating 



B.PSEUDO CODE 

Pseudo Code for the Distributed Adaptive Cache Update 
Algorithm 

Algorithm: cache Update 

Input: ID from, ID to, PACKET p, Boolean detect by me, 
Boolean continue to notify 

I* If p is a ROUTE ERROR and p: src=from and net ID= tell ID, 
then continue to notify is set TRUE. */ 



Output: vector <Notify Entry*>no£//y List 

1 for each entry e 2 cacheTabledo 

2 if link (from; to) 2 e: route then 



3 has broken link: =TRUE; 

4 directions: =forward 

5 elseiflink (to; from) 2 e: route then 

6 has broken link: =TRUE; 

7 directions: =backward 

8 else has broken link: =FALSE; 

9 if has broken link then 

10 position: =lndex (e: route; from); 

11 if detect by me then 

12 if direction = forward then 



42 



http://sites.google.com/site/ijcsis/ 
ISSN 1947-5500 



(IJCSIS) International Journal of Computer Science and Information Security, 
Vol. 10, No. 2, February 2012 



13 if (e: DP= 1 or e: DP = 2) and (not is First Node (e: route; 
net ID)) then 

14notifyList:=notify List [f (route [positional]; 

(Net /Djje: route [position Dl]))g 

15 if e: DP= 2 or [e:DP= land 

(not (p is a data packet and (p: src Route= e: 
roufe))))Then 

16& 17routeToUse= /0; 

18 Try to find a shortest cached route to n; 

19 if such a route is found then 

20 found Route: = the found route; 

21 if routeToUse= /0 or j found Route j<j routeToUse j then 

22 routeToUse: = found Route; 

23 tell ID: =n 

24 if routeToUse6= /0 then 

25 notify List: =notify List[f(tell ID; routeToUse)g 

26 else if direction = backward then 

27 if not is Last Node (e: route; net ID) then 

28 notify List: =notify List [f(route[position+l]; (net ID 
\\e:route[position+l]))g 

29 routeToUse= /0; 

30 for each node n 2/e: route [positional]; e: route [0] g do 

31 Try to find a shortest route to n in the cache table; 

32 if such a route is found then 

33 found Route: = the found route; 

34 if routeToUse= /0 or j found Route j<j routeToUse j then 

35 routeToUse: =found Route; 

36 tell ID: =n 

37 if routeToUse6= /0 then 



38 notify List: =notify List [f [tell ID; routeToUse) g 

39 else /* the node receives a notification.*/ 

40 indexes: =lndex (e: route; net ID); 

41 if direction = forward and index <position and 

(NotisFirstNode (e: route; net ID)) then 

A2notifyList:=notify List [f (route [indexUl]; (ne t/Djje: route 
[indexUl])) g 

43 if direction = backward and index >position and 
(NotisLastNode (e: route; net ID)) then 

44 notify List: =notify List [f (route [index+1]; (tIDjje: route 
[index+l]))g 

45 if (e: DP= 1 or e: DP = 2) and ((direction = forward and 
index >position) 

Or (direction = backward and /ndex <position)) then 

46 if continue to notify then 

47 if (direction = forward and net /D= to and 

(not isLastNode (e: route; net ID))) or(direction = backward 
and 

IsFirstNode (e: route; net ID) and (not net ID= to)) then 

48 notify List: =notify List [f (route [index+1]; (net/Djje: route 
[index+l]))g 

49 if (direction = forward and isLastNode (e: route; net ID) 
and 

(not net ID= to)) Or (direction = backward and 

Net ID= to and (not isFirstNode (e: route; net ID))) then 

50 notify List: =notify List [f (e: route [indexUl]; (net/Djje: 
route [indexUl])) g 

51 if not (net ID= to or (direction = forward and isLastNode (e: 
route; net ID)) 

Or (direction = backward and isFirstNode (e: route; net ID))) 
then 
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52 notify List: =notify List [f {route [index+1]; (netlD]]e: route algorithm searches the cache to find a shortest route to reach 

[index+1])) g; one of the upstream nodes. If it finds such a route, it adds 

that upstream node to the notification list. 
Pseudo Code for the Distributed Adaptive Cache Update 

Algorithm 



(11-25 forward link) (26-38 backward link) 



If a route contains a backward link (lines: 26-38), which 
means the link to the previous hop in the route is broken, the 
algorithm adds the downstream neighbor to the notification 
list. Since the node has forwarded at least one data packet 
using the route, the downstream nodes have cached that 
link. The upstream nodes also need to be notified. The 
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Fig 4 - How the algorithm operates based on the Reply Record field 



1. If the link is a forward link, and the node is upstream to it 
but not the source node, then the algorithm adds the 
upstream neighbor to the notification list. If the link is a 
backward link, and the node is downstream to it but not the 
destination, then the algorithm adds the downstream 
neighbor to the notification list. 

2. If the link is a forward link, and the node is downstream to 
it and receives a notification from the upstream endpoint of 
the broken link then there are three cases: 

(1) If the node is the other endpoint of the link, then the 
algorithm adds its downstream neighbor to the notification 
list; (2) If the node is the destination, then the algorithm adds 
its upstream neighbor to the notification list; (3) Otherwise, 



When a node detects a link failure, the algorithm does the 
above operation to add the closest upstream and/or 
downstream nodes to the notification list. If a node learns 
through a notification that a link is broken, it is responsible 
for notifying its upstream and/or downstream neighbors. 



The algorithm determines the neighbors to be notified based 
on the position of the node in a route and whether the link is 
a forward or a backward link (lines: 39-53): 

the algorithm adds both the upstream and downstream 
neighbors to the notification list. 

3. If the link is a backward link, and the node is upstream to it 
and receives a notification from the downstream endpoint of 
the broken link, then there are three cases: (1) If the node is 
the other endpoint of the link, then the algorithm adds its 
upstream neighbor to the notification list; (2) If the node is 
the source, then the algorithm adds its downstream neighbor 
to the notification list; (3) Otherwise, the algorithm adds both 
the upstream and downstream neighbors to the notification 
list. 

After adding the upstream and/or downstream neighbors to 
the notification list, the algorithm checks the 
Rep/yfiecorcffield. If an entry contains a broken link, the 
algorithm adds the neighbor that learned the link to the 
notification list (lines: 54-58). The algorithm then removes 
the cache table entry containing the broken link (line: 59). If a 
node detects a link failure when attempting to send a ROUTE 
REPLY, the algorithm removes the corresponding 
ReplyRecordentry (lines: 61-63). Finally, the algorithm 
removes duplicate nodes from the notification list. Duplicate 
nodes may occur in the list when the node is on multiple 
routes containing a broken link. The algorithm also removes 
the node that is the source node of a notification, since the 
algorithm adds both upstream and downstream neighbors to 
the notification list for the node that receives a notification 
from its upstream or downstream neighbour (lines: 51-53). 
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Fig 5 -Distributed cache updating for the dynamic source routing protocol 



IV.RELATEDWORK 

There are alternative ways to avoid the scheduling latency 
.issue described above. The main options are: 1) bring the 
scheduler closer to the adapters; 2) use provisioning (circuit 
switching); 3) use a buffered switch core; or 4) eliminate the 
scheduler altogether. Although one can attempt to locate the 
scheduler as close to the adapters as possible, a certain 
distance determined by the system packaging limitations and 
requirements will remain. Although the RTT (round-trip time) 
can be minimized, the fundamental problem of non- 
negligible RTTs remains valid. One can also do without cell- 
level allocation and rely on provisioning to resolve 
contention. Of course, this approach has several well-known 
drawbacks, such as a lack of flexibility, inefficient use of 
resources, and long set-up times when a new connection is 



needed, which make this approach unattractive for parallel 
computer interconnects. An alternative approach is to 
provide buffers in the switch core and employ some form of 
link-level flow control (e.g. Credits) to manage them. As long 
as an adapter has credits, it can send immediately without 
having to go through a centralized scheduling process. 
However, as optical buffering technology is currently neither 
practically nor economically feasible and the key objective of 
OSMOSIS is to demonstrate the use of optics, this is not an 
option. 

The last alternative is the load-balanced Birkhoff-von- 
Neumann switch .which eliminates the scheduler entirely. It 
consists of a distribution and a routing stage, with a set of 
buffers at the inputs of the second stage. Both stages are 
reconfigured periodically according to a sequence of 
permutation matrices. The first stage uniformizesthe traffic 
regardless of destination and the second stage performs the 
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actual switching. Its main advantage is that, despite being current age has exceeded its freshness lifetime. The current 



crossbar-based, no centralized scheduler is required. 
Although this architecture has been shown to have 100% 
throughput under a technical condition on the traffic, it 
incurs a worst-case latency. 



A.PERFORMANCE OF A GENERAL PREFETCH SCHEME 



age is an estimate of the time elapsed since the response was 
generated at the origin server. The freshness lifetime is the 
length of time between the generation of a response and its 
expiration time. HTTP/1.1 requires the origin server to send a 
Date header with every response, giving the time when the 
response was generated. The expiration judgement is 
performed in the cache when a cached entry is requested by 
a client: 



A majority of Web servers and clients use the Hypertext 
Transfer Protocol (HTTP) which has several cache control 
features. The basic cache mechanism in HTTP/1. luses the 
origin server-specified expiration timesandvalidators, as 
described below the "expiration" caching mechanism is to 
expect that origin servers will use the "Expires header" (or 
the max-age directive of the Cache-Control header) to assign 
future explicit expiration times to responses. Before the 
expiration time is reached the document is not likely to 
change. If the origin servers do not provide explicit expiration 
times, a HTTP cache can use other header values (such as the 
Last-Modified time) to estimate a plausible expiration time. 
The Last-Modified entity-header field value is often used as a 
cache "validator". When an origin server generates a full 
response, it attaches the validator to the response, which is 
kept with the cache entry. When a cache finds that a cached 
entry that a client is requesting has already expired, it makes 
a conditional request that includes the associated validator to 
the origin server. The origin server responds with a short 
code "Not Modified" (no entity-body) to validate that the 
cached entry is still usable if the entity has not been modified 
since the Last-Modified time; otherwise, it returns a full 
response including entity-body. Thus, it avoids transmitting 
the full response if the validator matches, and avoid an extra 
round trip if it does not match. In order to determine 
whether a cached entry is fresh, a cache needs to know if its 



entry_is_fresh= (freshness lifetime>current age) (1) 

If the cached entry is fresh, then the cache sends the entry to 
the client; otherwise, it sends a conditional request with 
associated validator to the origin server. The validation check 
is performed in the origin server: 

Not Modified^ (Validator= LastModified time) (2) 

The caching in HTTP/1.1 is shown in Fig. 1, where Expiration 
time is the time at which the origin server intends that an 
entity. Should no longer.bereturned by a cache without 
further validation Current age is the time since the response 
was sent by, or successfully validated with,. Origin server 
freshness lifetime is the length of time between the 
Generation of a response and its expiration time. 

Validator is a protocol element (e.g., an entity tag or a Last- 
Modified time that is used to find out whether a cache entry 
is an equivalent copy of an entity. 

Date is the value of the origin server's Date: header. 

Now is the current (local) time at the host performing the 
calculation. reguest_time\s the (local) time when the cache 
made the request that resulted in his cached response 
responsetime is the (local) time when the cache received the 
response. 
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Fig. I. Caching in HTIP/l.l 



Based on the HTTP/1.1 caching mechanism, we build an 
analysis model of caching, as shown in Fig. 2. Document n is 
modified in its origin server with cycle m/i. The first request 
from a client to the cache in a given cycle can not be satisfied 
by the cache (i.e., "miss" a fresh copy) and must fetch a copy 
of the document from the origin server. The consequent 
requests in the cycle are satisfied by the cache with the 

Cached copy (i.e., "hit" its fresh copy). If a request arrives at 
the cache between the expiration time and the end of the 
cycle, the cache must validate the cached copy before using 
it. The inter-arrival time of requests to document n is 
governed by a distribution fn(t). Because origin servers 
specify the expiration time based on its estimation or 
schedule to the next modification time (i.e., the end of the 
current modification cycle), the interval between the 
expiration time and the end of the cycle may have a 
stochastic or deterministic distribution. For reduction of 
access traffic, origin servers intend to reduce the interval. 
Now we define the variables required in the following 
analysis (see TABLE I). In the definition of the variables, we 
assume the total rate R of access traffic to the Internet from 
a given Intranet is finite given by: 



ind the ratio 



*=!>„- 



(3) 



r.=V* 



(4) 



Represents the probability of access to document n, n=l... N. 

We assume that the inter-arrival time of requests to 
document n is exponentially distributed: 



:.; 



0H/' »=U R 



(5) 



Then the probability g, that there is at least one request to 
document n drains a given modification cycle is aven by 



,?=l-r\ 



=U I 



Q 



Suppose we observe k modification cycles. Then there will be 
on the average kgnsuch cycles in which at least one request is 
made to document n. The first request in a given cycle will 
miss a fresh copy of the document and must fetch it from the 
origin server. The consequent requests in the cycle use the 
cached copy. 
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times {In, c, Tn, s}, and is independent of a specific choice of 
prefetch scheme, which is the latency of a "no-prefetch 
"cache scheme (i.e., the conventional caching). This reduction 
is determined by the number r and the selection of r prefetch 
documents and so depends on the specific prefetchscheme. 
The total hit probability is given by 



n during the k cycles, n=l, 2,..., N. 



Among the hit requests, there are some requests may arrive 
during the interval between the expiration time and the end 
of the cycle. They require the cache to validate the cached 
copy. Since the response with special code "Not Modified" is 
a short message, the transmission time is small. For 
simplicity, we omit this delay or assume this delay is included 
in the average delay Tn, c when a valid copy of the document 
is found in the cache. Now we derive general expressions for 
the average latency and hit probability of a generic prefetch 
scheme. Due to the limitation of the cache capacity, any 
prefetch scheme cannot cache all documents. Suppose that r 
documents are prefetched to the cache. By "prefetch" we 
mean the action that a proxy Web server takes by 
automatically caching and updating the r documents once 
they have expired, and this action is not driven by the client 
requests. Therefore, the average latency L for a prefetch 
scheme is given by 



f 1 ' 
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Where gn is the access probability given by (4), on is the hit 
rate given by (7), Tn, c is the response time from the cache, 
Tn,s is the response time from the origin server, and hn is 
defined by 



l=7fi-PM, ifI N 



(9) 



As is seen from (8), the average latency consists of two 
terms: the first term is solely determined by the document 
access rates {Rn}, modification cycles {mn} and response 
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Where Rn is the access rate to document n, R is the total 
access rate, gn is the access probability given by (4), and \n is 
defined by 



f.=T.(l-A), H2 » 



(11) 



Now we proceed to derive expressions for the required cache 
capacity and bandwidth. Whenever a request misses a 
"fresh" copy of document n in the cache, the cache must 
fetch the current version of the document from the 
origin server. Thus, the fetching rate for document n in the 
cache is (l-pn)Rn. 

The freshness lifetime of the document will be mn-tl, where 
tl is the origin server's "Date" in terms of HTTP/1.1, i.e., the 
time of the first request since the present modification cycle 
has started. Thus, the average freshness lifetime of the 
document is given by 



Vu/m' 
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Where fn(t) is given by (5), and gn is given by (6). The fetched 
document will be expectedly stored in the cache for the 
interval of the average freshness lifetime. In a prefetching 
scheme, the selected r documents are prefetched into the 
cache, while the other requested documents are dynamically 
stored in the cache as is done in the ordinary cache scheme. 
Therefore, the required cache capacity C is 
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B=l B=l 



The total transmission rate required for transmitting 
documents from the origin servers to the cache should be 
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Thus, the improvement in the average latency and the hit 
probability is achieved at the expense of increased 
bandwidth usage. 



V. Conclusion 

This work is motivated by the need to achieve low 
latency in an input-queued centrally-scheduled cell switch for 
high- Performance computing applications; specifically, the 
aim is to reduce the latency incurred between issuance of a 
request and arrival of the corresponding grant. 

The proposed solution features a combination of speculative 
and scheduled transmission modes, coupling the advantages 
Of uncoordinated transmission, that is not having to wait for 
a grant, hence low latency, with those of coordinated 
transmission, which is high maximum utilization. 
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Abstract - An ad hoc wireless network consists of set of mobile 
nodes connected without any central administration. Path finding 
processes in on-demand route discovery methods in mobile ad 
hoc networks (MANETs) use flooding. Source mobile node 
simply broadcast route request (RREQ) packet to its neighbour 
node and once again the neighbour node rebroadcast RREQ to 
its neighbour until unless route to a particular destination is 
found. The excessive RREQ packet can lead collision problem 
and consume more bandwidth in the network and decrease 
network performance. This paper proposed new probabilistic 
based route discovery method to reduce number of RREQ packet 
generated by the mobile nodes during the path finding process. 
In this paper exhibits an Ad Hoc on demand distance vector 
Routing protocol (AODV)[l] using in simulation based on our 
new probabilistic method route discovery processes. When we 
compare the modified AODV with traditional AODV, the 
simulation results shows significant improvement in the terms of 
routing overhead and end-to-end delay. 

Keywords: MANET, AODV, broadcast. 



I. INTRODUCTION 

Ad hoc wireless network utilize multi-hop nature and 
operating without the support of any fixed infrastructure. 
Hence this type of network called infrastructure less network. 
The absence of any central coordinator the routing protocol 
makes routing is very difficult. The path setup between two 
nodes is completed by the help of intermediate node. The 
routing is responsibilities of routing protocol, which include 
exchanging the route information, finding good path to a 
destination based on good routing metrics such as hop length, 
minimum power and life time of the links; collecting 
information about the path breaks; restoration of broken path 
with short processing power and bandwidth; and utilizing 
minimum bandwidth. The routing protocols faces many 
challenges such as mobility, bandwidth constraints, error-prone 
and shared channel, location dependent contention etc,. The 
major needed of routing protocol in ad hoc wireless networks 
are minimum route acquisition, quick route reconfiguration, 
loop free routing, distributed routing approach, minimum 



control overhead, scalability, quality of service, time sensitive 
traffic, security and privacy. 

The major challenge in MANET is multi-hop behavior. For 
Ad hoc network several routing protocols have been proposed. 
These protocols classified into three categories such as 
proactive or table driven routing protocols, reactive or on 
demand routing protocols and hybrid routing protocols. The 
table-driven routing protocols, all nodes keep the network 
topology information in the form of routing tables by 
periodically exchanging information. Routing information is 
flooded in whole network. If node require route to destination, 
it runs path finding algorithm to find the route. For example 
destination sequenced distance vector routing protocols 
(DSDV), Wireless routing protocols (WRP), Cluster Head 
Gateway Switch routing protocols (CGSR) are working under 
proactive routing. Reactive routing protocols do not maintain 
topology information, whenever the source node required route 
it initiates path finding process. These protocols do not 
exchange routing information periodically. For example Ad 
hoc on demand distance vector routing protocol 
(AODV),Temporally ordered routing algorithm 

(TORA),Location aided routing (LAR) and dynamic source 
routing protocols (DSR) are coming under reactive protocols. 
Hybrid routing protocols has the best features of proactive and 
reactive routing protocols. For example zone routing protocols 
(ZRP), Core extraction distributed ad hoc routing protocols 
(CEDAR) coming under hybrid category. 

In on demand distance vector routing protocol, the source 
node initiates RREQ packet and broadcast to its neighbors. The 
broadcasting is referred as flooding. For example the source S 
may initiate a destination search using RREQ packet. This 
packet contains location of S, destination ID and some control 
bits. If destination not reaches the intermediate node receives 
RREQ packet and rebroadcast to its entire neighbor until the 
destination found. The blind flooding causes unnecessary 
collision and bandwidth waste. For this problem some 
optimization techniques applied. The flooding can be classified 
into simple or blind flooding, probability based flooding, area 
based flooding and neighbor knowledge methods. The 
neighbor knowledge based flooding further classified into 
clustering based flooding, selecting forwarding neighbors and 
internal node based flooding. 
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A straightforward flooding is very costly and will 
result serious redundancy, contention and collision. 
They identified this broadcast storm problem. 
Recently, probabilistic broadcast schemes for MANETs 
have been suggested [2,3] for broadcast storm 
problem associated with the simple flooding. In the 
probabilistic scheme, each node rebroadcast received RREQ 
packet with given fixed probability p. This method reduces the 
routing overhead. This paper introduce new route discovery 
algorithm using probabilistic based broadcast in route discovery 
process. For our evaluation we used regulated probabilistic based 
route discovery in AODV protocol. It helps reduce the overhead 
of the route discovery process while maintaining a comparable 
performance in terms of reachability, saved rebroadcasts as 
achieved by conventional AODV. In this paper section 2 shows 
related work, section 3 shows the implemented regulated 
probabilistic route discovery process; section 4 shows 
performance evaluation of implemented route discovery process 
and section 5 conclusions about this paper and future direction. 

II. RELATED WORK 

There has been a lot of research work towards the 
communication overhead associated with the 
dissemination of RREQ packets for route discovery and 
maintenance processes in MANETs [4]. In blind 
flooding, every node in the Ad Hoc network retransmits a 
message to its neighbours upon receiving it for the first time. 
This type of flooding is very simple and easy to implement, it 
can be very costly and may lead to a serious problem, often 
known as the broadcast storm problem [5] that is characterized 
by high redundant packet retransmissions, network contention 
and collision. The Paper [6] have studied the flooding 
protocol and their experimentally study indicated 
that rebroadcast could provide at most 61% additional 
coverage of the original area and only 4 1 % additional 
coverage on average over that already covered by the 
previous broadcast transmission. Therefore, the authors 
conclude that rebroadcasts are very costly and should be used 
with caution. 

The paper [5] has proposed the broadcasting techniques 
into the following four categories; simple flooding, probability- 
based, area-based, and neighbour knowledge schemes. In the 
flooding scheme, every node retransmits to its neighbours as a 
response to every newly received packet. The probability-based 
scheme is a one of the way of controlling broadcast message 
floods[9], where each node received broadcast message and 
rebroadcasts with a predefined probability/?. Obviously when/? 
= 1 this scheme resembles simple (blind) flooding. In the area 
based scheme, a node determines whether to rebroadcast a 
packet or not by calculating and using its additional coverage 
area [8]. Of these, of interest in this study is the probabilistic 
scheme family of variants. In this category of broadcasting 
techniques, a mobile node rebroadcasts packets according to a 
certain probability. 

In paper [7] have described a probabilistic scheme where 
the probability p of a node retransmitting a message is 
computed from the local density n (i.e. the number of 
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neighbours) and a fixed value k for the efficiency parameter to 
achieve the reachability of the broadcast. This model has the 
disadvantage of being locally uniform. Indeed, each node of a 
given area receives a broadcast and determines the probability 
according to a constant efficiency parameter (to achieve some 
reachability) and from the local density . 



The paper [2-3] has also described a dynamic probabilistic 
scheme, which uses a combination of probabilistic and counter- 
based approaches. This scheme dynamically adjusts the 
rebroadcast probability p at every mobile host 
according to the value of the packet counters. The 
value of a packet counter does not necessarily correspond to the 
exact number of neighbors from the current host, since some of 
its neighbours may have suppressed their rebroadcasts 
according to their local rebroadcast probability. On the other 
hand, the decision to rebroadcast is made after a random delay, 
which increases latency. 

The dynamic probabilistic broadcast schemes forwarding 
RREQ packets based on dynamically adjusted by the local 
topology information. Topology information is obtained by 
proactive exchange of "HELLO" packets between neighbours 
to construct a one-hope neighbour list at every host. 
The adjusted probabilistic flooding scheme is a 
combination of the probabilistic and knowledge- 
based approaches. For both approaches presented in 
[2,3], respectively, there is an extra overhead, i.e. 
before calculating the probability, average number of 
neighbour nodes should be known in advance. 

With the broadcasting methods described above, 
the simplest one is flooding, which also produces the 
highest number of redundant rebroadcasts. The 
probabilistic approaches reduce the number of 
rebroadcasts at the expense of reachability[10,l 1]. 
Counter-based algorithms have better reachability 
and throughput, but suffer from relatively longer 
delay. Area-based algorithms need support from GPS 
or other location devices, and the neighbour- 
knowledge-based approaches require the ex- change 
of neighbourhood information with hosts. In 
this paper, we propose a new probabilistic 
approach that dynamically fine-tunes the 
rebroadcast probability for routing request packets 
(RREQs) according to the number of its neighbour 
nodes to yield higher throughput, higher saved 
rebroadcast, better reachability, and lower rout 
request. The details of the proposed approach are 
described in the following section. 

III. PROBABILISTIC FLOODING 

The probabilistic based flooding scheme is an alternative 
method to simple flooding to find destination node. This is like 
to simple flooding, except that nodes only rebroadcast with a 
predetermined probability. It is used to reduce redundancy to 
improve the broadcast storm problem. Every neighbor may 
rebroadcast the packet exactly one time based on some random 
condition. This continues until all reachable network nodes 
have received the packet. If the probability is 100%, this is 



51 



http://sites.google.com/site/ijcsis/ 
ISSN 1947-5500 



equivalent to flooding. In probabilistic based flooding scheme, 
if node receiving a broadcast message, the node rebroadcast 
with a pre-determined probability of p. All nodes have the 
same probability to rebroadcast the message, regardless of its 
number of neighbours. In dense networks, many nodes share 
equivalent transmission ranges. These probabilities control the 
frequency of rebroadcasts and thus might save network 
resources without affecting delivery ratios. In sparse networks 
having less shared coverage, thus some nodes may not receive 
all the broadcast packets unless the probability parameter is 
high. So if the rebroadcast probability p is set to a smaller 
value, reachability will be poor. On the other hand, if x is set to 
a larger value, many redundant rebroadcasts will be generated. 
The rebroadcast probability must be high at the hosts in sparser 
areas and low at the hosts in denser areas. To calculate of 
density areas requires mobile hosts to periodically exchange 
"HELLO" messages between neighbours to construct a 1-hop 
neighbour list at each host. A more number of neighbours 
involve that the host is in a dense area, at the same time as a 
low number of neighbors involve that the host is situated in a 
sparser area. So, based on the number of neighbors, we should 
increase and decrease the rebroadcast probability value based 
on the sparse and dense network. 

This kind of variation causes a dynamic stability between 
rebroadcast probabilities. The probabilities at the stability states 
should lead to best solutions in broadcast mechanism. In this 
paper we used simple adaptation algorithm. The simple 
introduction of the regulated probabilistic flooding algorithm is 
explained in Algorithm 1 working as follows. An intermediate 
node X hearing a broadcast message m, the node X 
rebroadcasts this message according to a probability; if the 
broadcast message is received for the first time, and the number 
of neighbours of node X is less than average number of 
neighbours, the node has to set high rebroadcast probability p. 
If the node has very few neighbour nodes that is average 
number of nodes/2 then set high rebroadcast probability p—1 
(Simple flooding). If node X has more nighbours then set low 
rebroadcast probability/?. 

The regulated rebroadcast probability for probabilistic 
broadcasting algorithm for each node is briefly presented in 
Algorithm 1. 



Algorithm 1 The regulated probabilistic flooding algorithm 

Neighbor node receiving a broadcast packet m at node X 

Avg is average number of neighbor (threshold value) 

get degree n of a node X (number of neighbors) 

if packet received for the first time then 

if n < Avg then 

if n<Avg 12 then 

node X has very few number of neighbors 

set rebroadcast probability p=l /* Simple flooding */ 

endif 

node Xhas less number of neighbors 
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set high rebroadcast probability/? =p\ 



else 

node X has more number of neighbors 

set low rebroadcast probability/? =p2 

endif 

endif 

generate a random number R over [0, 1 ] 

ifR <p then 

rebroadcast message 

else 

drop message 

endif 



Our algorithm is a dynamic nature approaches. It 
dynamically regulated the rebroadcast probability p at each 
mobile host according to the value of the local number of 
neighbours. The value of p changes when the host moves to a 
different neighbourhood. In a sparser area, the rebroadcast 
probability is larger or simple flooding and in denser area, the 
probability is lower. Compared with the probabilistic approach 
where p is fixed, our algorithm achieves higher saved 
rebroadcast. Also, the decision to rebroadcast is made 
immediately after receiving a packet in our algorithm without 
any delay. 

We present an estimate of the average number of 
neighbours as the basis for the selection of the value of p. Let A 
be the area of an ad hoc network, N be the number of mobile 
hosts in the network. The average number of neighbour can be 
obtained as shown below. 

Avg=(N- Y)*0J*(wP- I A) 

IV. PERFORMANCE EVALUATION 

The performance of our regulated probability algorithm has 
been evaluated against the fixed rebroadcast probability and 
regular flooding scheme. All three methods have implemented 
in the AODV protocol. The metrics for comparison include the 
saved rebroadcasts (SRB) and reachability. For this evaluation 
we used GloMoSim network simulator to conduct experiments 
to measure the performance of probabilistic flooding. In 
original AODV protocol uses simple flooding to broadcast 
routing requests to identify the destination system. In this paper 
we have implemented two AODV differences: one using 
adjusted probabilistic method with Fixed Probability, and the 
other using our adjusted probabilistic algorithm. The main aim 
of this research is to reduce the number of rebroadcasts packets 
during route discovery processes, it reducing the network 
traffic, decrease packet collision and increase the overall 
network performance. 

Since this probabilistic based approach does not fit on all 
scenarios, there is a small chance that the route requests cannot 



52 



http://sites.google.com/site/ijcsis/ 
ISSN 1947-5500 



reach the destination. During this situation it is necessary to re- 
generate the route request if the previous route request failed to 
reach the destination. The AODV protocol uses flooding in the 
route discovery process, which means that all route requests 
will reach their destinations if the network is not partitioned. 
Based on this observation, our algorithm must perform better 
than AODV in dense networks. 
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moving towards it with speed selected from a uniform 
distribution. After the node reaches that destination, it again 
stands still for a pause time interval and picks up a new 
destination and speed. This cycle repeats until the simulation 
terminates. The maximum speed of 10 m/sec and pause times 
of seconds are considered for the purposes of this study. The 
simulation parameters are summarised in Table 1 . 

The performance of broadcast protocols can be measured in 
the terms of message re-transmissions with respect to the 
number of nodes in the network. In this work, we use 
rebroadcast savings, which is a complementary measure and is 
precisely defined below. The next important metric is 
reachability, which is defined in terms of the ratio of nodes that 
received the broadcast message out of all the nodes in the 
network. The formal definitions of these two metrics are given 
as follows . 



Figure 1: SRB Vs rebroadcast probability with node speed lOm/s 
Table 1: Parameters used in simulation 



Simulation parameter 


Value 


Simulator 


GloMoSim v2.03 


Network range 


600 m x 600 m 


Transmission range 


250 m 


Mobile nodes 


25,50,75,100 


Mobility 


Random waypoint model 


Band width 


2Mbps 


Packet size 


512 bytes 


Packet rate 


10 packet per second 


Simulation time 


900 s 



The simulation network considered for the performance 
analysis of the rebroadcast probability Vs network density, the 
nodes from 25 to 100 nodes placed randomly on 600 x 600m 
area, with each node in communication transmitting within 250 
meter radius and the network having bandwidth of 2Mbps. The 
random waypoint model is used to simulate 25 mobility 
patterns with retransmission probabilities ranging from 0.5 to 
percent with 0.1 percent increment per experiment. 

The random waypoint model considers nodes that follow a 
motion-pause recurring mobility state. Each node at the 
beginning of the simulation remains stationary for some pause 
time seconds, then chooses a random destination and starts 
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Figure 2: SRB of three broadcast schemes vs network density with 
node speed of lOm/s 

Saved ReBroadcasts (SRB) : Let r be the number of nodes 
that received the broadcast message and let and t be the number 
of nodes that actually transmitted the message. Saved 
rebroadcast is then defined by {rt)/r. 

Reachability (RE) : is the percentage of nodes that 
received the broadcast message to the total number of nodes in 
the network. For meaningful information, the total number of 
nodes should include those nodes that are part of a connected 
component in the network. For disconnected networks this 
measure should be applied to each of the components 
separately. 

In this experiment we have compared the saved broadcast 
(SRB) in fixed probability and our adjusted probabilistic 
algorithm. Figure 1 shows that our algorithm can significantly 
reduce SRB with rebroadcast probabilities ranging from 0.5 to 
1 .0 percent with 0. 1 percent increment per trial for a network of 
50 nodes and maximum speed 20 m/s and pause time. Figure 
2 shows the SRB of the fixed probabilistic scheme against our 
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adjusted probabilistic algorithm. The SRB of adjusted 
probabilistic is 40% in low-density networks (25 nodes) and 
50% in high-density networks (150 nodes). 



The SRB of the fixed probabilistic scheme with the 
probability assigned to 0.7 in any density of network is around 
28-33%. Figure 3 shows that reachability increases when 
network density increases, regardless of what kind of the 
algorithm is used. The simple flooding method has the best 
performance in reachability, as expected. The performance of 
adjusted probabilistic algorithm shows that the reachability is 
above 95% in any density of the network. In all network 
densities, the reachability of our algorithm performs better than 
the probabilistic scheme when probability Is set to 0.7. In 
higher density networks, i.e. for 120 hosts and above, the 
reachability of our approach and flooding are evenly matched, 
with both performing very adequately (close to 100%). We 
have noted that the extra redundancy of RREQ transmissions is 
what results in more contention and collisions. Considering all 
the previous results, the adjusted probabilistic-enabled AODV 
is shown to improve AODV performance in all aspects for 
scenarios with low mobility. 
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can improve the saved broadcast up to 45% without affecting 
reachability. We have planned to evaluate the performance of 
adjusted probabilistic flooding on other on demand distance 
vector routing protocols such as Dynamic Source Routing 
(DSR) algorithm. 
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Figure 3: Reachability of three broadcast algorithms 

V. CONCLUSIONS 

Normally AODV uses simple flooding at the time of route 
discovery process. This paper we discussed the performance of 
adjusted probabilistic flooding on the AODV protocol, in order 
to increase saved rebroadcasts of route requests. This algorithm 
determines the rebroadcast probability based on number of 
neighbour host or network density. To improve the saved 
rebroadcasts, the rebroadcast probability of low density nodes 
is increased while that of high density nodes is decreased. If 
you compare adjusted probabilistic flooding with simple 
flooding, our simulation results shows during high mobility and 
density network, the adjusted probabilistic flooding algorithm 
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Abstract— In this paper, a new hybrid intelligent model 
comprising a cluster allocation and adaptation component is 
developed for solving classification and pattern recognition 
problems. Its computation ability has been verified through 
various benchmark problems and biometric applications. The 
proposed model consists of two components: cluster 
distribution and adaptation. In first module, mean patterns are 
distributed in to the number of clusters based on the 
evolutionary fuzzy clustering, which is the basis for network 
structure selection in next module. In second module, training 
& subsequent generalization is performed by the syndicate 
neural networks (SNN). The Number of SNNs required in the 
second module will be same as the number of clusters. Where 
as, each network contains as many output neurons as the 
maximum number of members assigned to each cluster. The 
proposed novel fusion of evolutionary fuzzy clustering with 
neural network yields superior performance in classification 
and pattern recognition problems. Performance evaluation has 
been carried out over wide spectrum of benchmark problems 
and real life biometric recognition problems. Experimental 
results demonstrate the efficacy of the methodology over 
existing ones. 



Keywords- Hybrid Intelligence; 
Clustering; Syndicate Neural Network. 
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I. 



Introduction 



Computational Intelligence is the emerging field 
rigorously applied for various classification and pattern 
recognition problems. An efficient synergism of 
evolutionary computation, fuzzy logic and neural network 
can lead to the development of computationally efficient and 
performance rich system. Computational intelligence based 
methods have been well applied for efficient solution of 
various real world problems [l]-[3] over the last decades. In 
the recent past, these techniques are also widely applied for 
biometric applications [4]-[6]. The strength and 
effectiveness of these techniques have been described in 
various literatures [7]-[10]. Hybrid intelligent system, the 
combination of evolutionary, fuzzy and neural paradigm 
into a single system, van be used for pattern recognition and 
classification applications. Fuzzy clustering plays an 
important role in various classification problems while 
introduction of evolutionary algorithms in fuzzy clustering 
provides a better optimization with various aspects of 
clustering [11]. Fuzzy clustering has proven its 
comparatively better ability for various classification 



problems [12] over the traditional clustering techniques. 
Some variants of fuzzy clustering have been well introduced 
for various classification problems dealing with noise [13]. 
Combining fuzzy clustering with evolutionary computation 
is quite efficient for solving classification and recognition 
problems [11], [14]. There exist many variants of 
evolutionary fuzzy clustering techniques [11], but still some 
of them techniques yield poor classification accuracy 
because of unsupervised nature. On the other hand, robust 
performance and quick convergence of artificial neural 
network with small complexity are vital for its wide 
applications [3] [7] [15]. The clever choice in the defining 
the structure of network make it computational more 
efficient and also over come from the general problems of 
neural networks. There is no general methodology exists for 
selection of the best neural network structures however it 
also depends on kind of the problem itself. Therefore, we 
have proposed a hybrid intelligent model containing two 
modules; in which one decide the structure of syndicated 
neural network in the second module. 

The proposed model consists of two components: 
cluster distribution and adaptation module. In first module, 
mean patterns are distributed in to the number of clusters 
based on the evolutionary fuzzy clustering, which is the 
basis for network structure selection in next module. The 
proposed evolutionary fuzzy c-mean clustering is further 
generalized with the Minkowski distance matrices to 
provide flexibility in clustering with respect to their shapes 
[11]. This is named as EFC-MD. In second module, training 
& subsequent generalization is performed by the syndicate 
neural networks (SNN). The number of SNNs required in 
the second module will be same as the number of clusters 
generated in first module. Where as, each network contains 
as many output neurons as the maximum number of 
members assigned to each cluster. The proposed novel 
fusion of evolutionary fuzzy clustering with neural network, 
termed as EFC-MD-SNN, yields superior performance in 
classification and pattern recognition problems. The most 
widely used training algorithm, the error back propagation 
learning algorithm, is considered for the training in the 
adaptation module. Performance evaluation has been 
carried out over wide spectrum of benchmark problems and 
real life biometric recognition problems. Experimental 
results demonstrate the efficacy of the methodology over 
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existing ones. Impact of varying number of clusters and 
number of members is also investigated. 

Rest of the paper is organized as following- Section II 
presents the mathematical background and design of the 
hybrid intelligent model. It also elaborates the two basic 
functionary components of the proposed model. In order to 
estimate the strength and effectiveness of proposed model, 
the number of benchmark problems of various fields has 
been considered in Section III. Section IV is devoted for 
performance evaluation of the proposed hybrid model in 
biometric applications. The section V presents the 
inferences and discussions over the effect of various 
parameters considered in model. Finally Section VI 
concludes the paper. 

II. Hybrid Intelligent Model 

This section describes the novel combination and co- 
ordination of three major paradigm of computational 
intelligence, viz evolutionary, fuzzy and neural, into single 
system, whose intelligent behavior is demonstrated in next 
section. The proposed hybrid intelligent model incorporates 
the fusion of two basic activities: cluster allocation and 
adaptation. The general structure of the proposed model is 
presented in fig. 1. It consists of cluster allocation module 
and adaptation module. First module involves the 
distribution of mean patterns into the number of clusters. 
The proposed mechanism for cluster allocation is fuzzy c- 
means (FCM) clustering along with evolutionary search. 
The different runs of conventional FCM clustering generate 
different partitioning [11]. Therefore, evolutionary search is 
combined with conventional FCM for finding the optimal 
partitioning among number of runs of FCM clustering. The 
proposed evolutionary fuzzy c-mean (EFC) clustering is 
further generalized with Minkowski distance matrices and 
named as EFC-MD. The Minkowski distance yields variable 
cluster shapes while conventional Euclidian distance 
restricts to spherical cluster shape [11]. The outcome of this 
module is the fuzzy distribution of mean patterns into 
number of clusters. These clusters decide the structure of 
syndicate neural network (SNN) in adaptation module. This 
module is devoted for learning process in the model for 
patterns generated in previous module. This module is also 
responsible for performing the generalization with data not 
used in training (test data), hence yields 
classification/recognition results to user. In adaptation 
module, the number of SNNs is same as the number of 
clusters generated by first module, while the number of 
output neurons in each SNN is same as the maximum 
number of cluster members (MCM). The number of hidden 
neurons in the network is contingent upon the problem 
considered. Learning in model is performed by back 
propagation algorithm with momentum. The maximum 
value among the maxima of outputs of each SNN 
determines the class of corresponding or recognized 
patterns. 



A. 



Cluster Allocation 



This module yields the basic distribution of patterns of 
training dataset into the number of clusters with proposed 
EFC-MD. Mean patterns are computed for each class by 
taking average of the number of samples per class selected 
for training. First part of this module is involved for the 
distribution of mean patterns into the number of clusters 
while second part performs the assignment of maximum 
number cluster member (MCM) in cluster allocation module. 
The evolutionary search is applied only for finding the 
optimal partitioning. It is to important to mention here that 
we execute EFC-MD algorithm for different number of 
clusters in order to access the impact of varying number of 
clusters on accuracy. 

1) Allocation of mean patterns into clusters (EFC-MD) 

Let considered training set consists of N classes and each 
class possesses S patterns. Let X jk is the / pattern of the k' h 

class, where 1 < j < S and 1 < k < N . The mean vector 
for each class is 



x h = 



7 = 1 



Data Set 



Training Set Testing Set 



Cluster 
Allocation 
Module 
(EFC- 
MD) 




Adaptation 
Module 

-> 

(SNN) 



Hybrid Intelligent Model 




Figure 1 . The Proposed Model 

Let X= { X l ,X 2 ,X 3 , X N } be the mean patterns of N 

classes, where each X k € X contains ^-attribute values. 

EFC clustering algorithm divides N datasets into fuzzy 
partition matrix U (size CxN) containing C clusters. The 
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membership function in U is defined as jil jk which satisfies 
the following constraints - 

jU ik S [0,1] , 1<(<C andl<i<]V (1) 



o<2X<tf 

o<ix=i. 



(2) 
2<C<N (3) 



In EFC clustering, each chromosome contains a sequence of 
attribute values representing a cluster. Let & kj is a 
chromosome, defined as 

® ki = { 1 10 if k' h data set belongs to i cluster, otherwise } 

Where 1 < i < C and 1 < k < N 

Initially C clusters are encoded in each chromosome and 
population is initialized randomly. Therefore, in each run 
different initial population is generated. Optimization of 
fuzzy partition matrix U is made more general, when it is 
associated with Minkiwiski distance measure. Objective 
function of the evolutionary fuzzy c-mean clustering with 
Minkowski distance (EFC-MD) possesses a generalization 
parameter 'p' whose variations produce different shapes of 
clusters. The Objective function for EFC-MD is defined as 
follows : 



J(lii,0) = Y^(M ik ) m d 2p (x k O i ) 

k=\ i-l 

Where d /3 (x k O i ) = 



(4) 



l«x„-0„ll' 

< /? < 1 and l<p<oo 



The fuzzifier m is a weighting exponent, which 
determines the degree of fuzziness. In general, the values of 
m lie between one and infinity which greatly influence the 
performance of Fuzzy C-mean clustering (FCM) algorithm 
[29] . When m approaches to infinity, the solution will be the 
center of gravity of whole dataset and when m=\ it behaves 
like classical c-means. Interval (1, 3] is the best choice of m, 
however m=2 is mostly used in literature. Therefore 
selection of suitable fuzzifier m is very important for 
implementation of FCM. In [29], it has been shown that a 
proper weighting exponent value depends on data itself. 
Main motivation behind considering Minkowski distance is 
to give freedom to the proposed algorithm for generating 
variable shapes of the clusters which is not possible with 
Euclidian distance measure. The exact nature of the shapes 
of clusters to be generated (which may be boxes, ellipsoids, 
spheres and others [11]), depend on the values of 
generalization parameter 'p'. Selection of this distance 



measure does not only tend the shape of cluster spherical, 
which is often in Euclidian distance. 

The fitness function 'f is a criterion to determine the 
best partitioning in evolutionary search, which is selected 
same as in [19] and its value is inversely proportional to the 
Xie Beni index (XB). Higher value of 'f gives survival to 
the fittest population and best population is selected among 
the various off-springs generated on different runs. EFC-MD 
algorithm runs through the necessary conditions for 
minimizing the objective function with the iterative update of 
following centre of the clusters and member function: 



S ^a 



x. 



O 



k=\ 



M ik 



M ik 
d(x k ,O i ) 



m-\ 

I 
t m— 1 



(5) 



i = U ,C (6) 



Let J(jU,0) is the objective function at { iteration then 
The EFC-MD algorithm terminates when 



j(ju,oy' +1) - j{ji,oy } \\< 3 



(7) 



Where, 3 is a threshold. Initially, partition matrix U is 

initialized randomly. Let U (Q \U (l \ U (l) be the / 

populations generated by the I runs of this algorithm. Best 
U is selected based on highest value of the fitness function 
'/', which then generate the new off-springs by choosing this 
U as parent. Thus repetitive execution of this algorithm 
produces best partitioning among the various populations 
generated by different runs. 

2) Fixed member allocation 

In conventional FCM, the size of each cluster varies 
with number of members. In order to avoid this variability 
and to cope up with the associated syndicate neural network 
(SNN) of the developed hybrid intelligent model, we need to 
obtain a uniform structure of all clusters. This process 
involves the allocation of fixed number of elements in each 
cluster, for which we select a fixed number of top 
membership grade elements into clusters. The maximum 
number of top membership grade elements assigned to the 
cluster is named as maximum cluster member (MCM). This 
parameter also plays an important role in selecting the 
number of output neurons in SNN and accuracy of the 
system. In empirical evaluation of the proposed model, we 
performed experiments with varying number of MCM along 
with number of clusters and report the case which yield 
reasonably good performance. 
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B. 



Adaptation 



TABLE I. 



COMPARISION OF ACCURACY FOR WINE DATASET 



This module performs two functions, learning and 
classification/recognition. The main function of this module 
is to train the allocated training patterns using syndicate 
neural network (SNN), containing only one hidden layer, 
with back propagation learning algorithm. Recognition of 
test patterns is performed by trained SNNs. For training, a 
three layer neural network is considered in each SNN. The 
number of output neurons in a SNN is same as the number of 
MCM, while the number of SNNs involved with this 
adaptation module is equal to the number of clusters. The 
mean patterns are distributed into each SNN as allocated by 
the cluster allocation module. After that, training patterns are 
entered in each SNN for learning. 

For testing, feature vectors of unknown patterns are fed 
into the SNN. Let Op (M t ), Op (M 2 ),...., Op(M c )be the 

maximum output of SNNs M 1 ,M 2 ,...., M c respectively. 

c 
Let ® = MAX(Op x (M,)) where Op t (Af,.) is the 

i-\ k k 

maximum outcome of the f syndicate network 

corresponding to the X k pattern. A pattern is identified by 

corresponding member of cluster for which maximum value 
of <t> is obtained. 

III. Performance Evaluation Using Benchmark 

Datasets 

In order to evaluate the performance of proposed model, 
we have used a wide range of benchmark problems and 
biometric problems. This section first presents the 
performance of the proposed model over standard dataset 
like wine, SPECTF Heart and MONK dataset. In second 
phase of experiments, we have considered two standard 
biometric datasets, AT & T bell lab face and AR face dataset. 
Comparative analysis of EFC-MD-SNN is done with 
conventional FCM, EFC-MD, conventional neural network 
(MLP) and other strategies presented in different other 
refereed journal papers. This analysis is clearly 
demonstrated through different measures, tables and graphs. 



A. 



Wine Dataset Problem 



The wine data set is the outcome of the chemical 
analysis of wine based on 13 constituents varies in three 
different kinds of wine classes [19]. There are total of 178 
data values of all three classes. In this experiment, we 
considered 58% of data for training and rest 42 % data for 
testing. Here, EFC-MD employs p=4 instead for Euclidian 
equivalent value (p=2) and we get 97.33% testing accuracy. 
Experiments were performed by varying the number of 
clusters (C) and MCM. Table I presents the comparative 
performance of FCM, EFC-MD, MLP and EFC-MD-SNN. 
The neural networks consider same number of hidden 
neurons and learning cycles. The 99.03% testing accuracy is 
achieved at C=2 and MCM=2 with smallest possible neural 
network structure. 



Method 


Accuracy (%) 


Training Set 


Test Set 


FCM 


54.66 


55.33 


EFC-MD 


97.08 


97.33 


MLP 


too 


94.3 


EFC-MD-SNN 


100 


99.03 



B. 



SPECTF Heart Dataset Problem 



This data set [17] is based on cardiac single proton 
emission computed tomography (SPECT) images. Each 
patient is classified into two categories normal and abnormal. 
Dataset contains 267 instances each of them having 44 
attributes. In [17], it is recommended to take 80 instances for 
training and 187 instances for testing out of 267 instances. 
The comparative analysis of CLIP3 [20], EFC-MD, MLP 
and EFC-MD-SNN is given in Table II. For EFC-MD, again 
yield best results and it is achieved when generalization 
parameter 'p ' for Minkowski distance is p=4. 



TABLE II. 



comparision of classification accuracy for 
Spectf Heart Dataset Problem 



Method 


Accuracy (%) 


Training Set 


Test Set 


CLIP3 


- 


77 


FCM 


too 


67.91 


EFC-MD 


100 


85.56 


MLP 


too 


86.63 


EFC-MD-SNN 


100 


89.3 



We get 89.3% accuracy at C=2, MCM=2. Fusion of 
EFC-MD with SNN again outperforms as compared with 
EFC-MD and MLP. 



C. 



MONK Data Problem 



The MONK problem was the basis of the first 
international comparison of learning algorithms. There are 
three MONK problems. The third MONK problem with a 
random noise added to the data is used in this paper. This 
dataset consists of 554 patterns with seven features, and 
these patterns are assigned to two classes. Training and 
testing set considered here is same as used in [13]. Among 
122 training patterns, 60 patterns belonging to the positive 
class and rest 62 belong to the negative class. The testing set 
consists of 432 patterns in which 228 patterns belonging to 
the positive class and 204 patterns belonging to the negative 
class. Here, EFC-MD performs better for p=2. The 
experiments were also performed with varying number of C 
and MCM. We obtained 98.15% accuracy at C=2, MCM=3 
with smallest possible SNN (10 hidden neurons) at 
MSE=0.00098. Our method performs better than SVM [13] 
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and KFCM-FSVM technique described in [13], as shown in 
table III. 



TABLE III. 



COMPARISION OF ACCURACY FOR MONK DATASET 



Method 


Accuracy (%) 


Training Set 


Test Set 


SVM 


-- 


97.68 


KFCM-FSVM 


- 


97.68 


FCM 


81.9 


83.33 


EFC-MD 


100 


96.29 


MLP 


100 


96.99 


EFCMD-SNN 


100 


98.15 



IV. Performance Evaluation Using Biometric 
Datasets 

This section presents the performance evaluation of 
proposed EFC-MD-SNN model for biometric standard 
datasets. The feature extraction and dimensionality reduction 
of data set is performed with Principal component analysis 
followed by fisher linear discriminant (PCA-FLD) [8]. PCA 
is more suitable in applications where images slightly vary 
from one another. Whereas, its performance is not well when 
major variations are involved in images such as occlusion 
and noise. In experiments, we also observed that when PCA 
is combined with FLD yield very good results. 

A. AT & T Bell Face Dataset 

The AT & T Laboratories face database [16] contains 
400 face images from 40 individuals captured over the span 
of a 2-year period from subjects aged between 18 and 81. 
There are 10 different images of each person with variations 
in pose, scale, orientation and expression, as shown in fig 2. 
We use randomly chosen 200 images for training which 
contains 5 images per person and rest 200 images for testing. 
Extraction of dominant feature vectors of the faces for 
training is done with PCA+FLD. In Training set, the mean 
vector is calculated as the average of five images; thus we 
get 40 mean vectors which are distributed into the varying 
number of clusters in cluster allocation module. This is done 
using EFC-MD technique which minimizes the objective 
function in each run of algorithm. Plot between number of 
iterations and objective function for single run of EFC-MD is 
shown in fig. 3, which indicates the convergence of objective 
function for AT & T face and AR face dataset. It is worth to 
mention here that proposed hybrid intelligent system with 
EFC-MD has clearly demonstrated the effect of variation of 
'p' in this experiment. Along with variation in C and MCM, 
we obtained best accuracy of 97%, when p = 2. But, 
accuracy up to 99.5% has been achieved at p = 4. 



r «R9'- 



A cluster allocation table shows the cluster 
allocation of mean patterns of training set which is 
genereated by the first module of the model. Allocation 
table for AT&T face dataset is shown in fig. 6. Each row 
provides the information for corresponding cluster while 
columns correspond to the member of clusters. Entrty in 
each cell of this table shows the pattern number. This table 
is constructed for the C=10 and MCM=10. 
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Figure 3. Plot between objective function and number of iterations. 

In the next phase, the adaptation module of 
proposed model trained according to the cluster distribution 
of training set. On increasing the number of hidden neurons 
in SNN from 5 onward, we observe increase in accuracy 
upto 10 neurons. After that no further improvement is 
observed. The adaptation module is run upto 20,000 
learning cycles in all experiment. When the effect of 
variations in C & MCM is critically analyzed, we conclude 
that the best result have been obtained at C=10 and MCM= 
10. A comparision of proposed technique among with other 
recent researches of smilar methodologis is summarized in 
table IV for this dataset. 



TABLE IV. Comparision with recent methods for AT & T Face 



Figure 2. AT & T bell Face dataset 



Method 


Testing Accuracy 


DFLDA [25] 


96.2 


NFL[26] 


96.875 


RBFNN[8] 


98.08 


Multiple classifier[27] 


97.1 


Combined framework[28] 


97.65 


Fuzzy MLP[1] 


97.87 


EFC-MD-SNN 


99.5 
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TABLE V. 



Actual Outputs for AR FaceDataset 



B. 



AR Face Dataset 



AR Face data consists of more than 4000 color images of 
126 individuals taken in two sessions separated by two 
weeks [21]. These images include more facial variations, 
including illumination change, facial disguises and 
expressions as compared to the AT & T bell lab face dataset. 
We select a subset containing 30 male and 30 female 
subjects. For each subject, 14 images having variations in 
facial expressions and illumination (fig. 4) have been 
selected; Where seven images from session 1 is used for 
training and seven images from session 2 is used for testing. 
The images are cropped with dimension 241x293 and 
converted into gray scale. Cluster allocation table is shown 
in fig. 7 is constructed for C=12 and MCM= 12. Effect of 
varying parameters is shown in table V. We achieved 
91.19% accuracy at C=12 and MCM=12 with 0.0054 MSE. 
The table VI presents the comparison of proposed strategy 
with some other. It again reveals superiority of our method 
among them. 



c 


MCM 


Learning 
Cycles 


MSE 


Accuracy (%) 


Training 

Set 


Test Set 


8 


10 


16000 


0.0228 


96.42 


90.47 


10 


12 


20000 


0.0063 


97.14 


90.71 


12 


12 


24000 


0.0054 


97.61 


91.19 


14 


12 


28000 


0.0071 


95.23 


87.38 



TABLE VI. 



comparision of classification accuracy for ar 
Face Dataset 



Method 


Accuracy with test set 


Nearest Neighbour [27] 


89.7% 


Nearest subspace [28] 


90.3% 


EFC-MD-SNN 


91.19% 




sr 



Figure 4. AR Face sample images of training set showing variations in expression and illumination. 
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Figure 5. Impact of various parameters on accuracy a) by varying C (b) varying MCM. 
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Figure 6. Cluster allocation table for AT & T Face datset in which each row representing a cluster and number of columns corresponding to MCM. 
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Figure 7. Cluster allocation table for A R Face datset in which each row representing a cluster and number of columns corresponding to MCM. 
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Figure 8. Plot : (a) between number of feature vector and accuracy (b) between number of learning cycles and accuracy 
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Figure 9. Plot between Number of hidden neurons versus accuracy. 



V. Inferences for biometric application of model 

In order to evaluate the performance of proposed 
intelligent system, the experiments are again performed with 
varying values of C and MCM for both image datasets. Fig. 
5, graphically presents the effect of these variations keeping 
10 neurons in hidden layer of SNN. C=10 and C=12 are the 
best choices for AT & T face dataset while best accuracy is 
obtained in case of AR database when C=12. MCM value is 
kept 10 in both cases. On increasing the value of C above 
this, the detonation in model performance is observed. 

Fig. 5 (b) shows the variation effect of MCM 
keeping C=12. We observe that best accuracy is achieved at 
MCM=10 and MCM =12 for AT & T face and AR face 
dataset respectively. When we increase the MCM, it means 
that there is increase in fuzziness, up to some extent the 
increase in fuzziness enhances the performance, but after that 
it degrades performance. It has been also observed that the 
most suitable value of MCM is close to C in our 
experiments. Selection of number of feature vectors also 
plays an important role in performance (fig 8 (a)). For AT & 
T face dataset, maximum accuracy is obtained when 30 
features are selected while in case of AR face dataset 50 
features are required for best performance. Plot between 
iterations versus accuracy has been shown in fig 8(b). The 
reasonably good accuracy is achieved when adaptation 
module of proposed hybrid intelligent model is run on 
average 20,000 and 24,000 learning cycles for two face 
datasets respectively. Variation in number of hidden neurons 
also plays crucial role in whole SNN design. Effect of 
varying number of hidden neurons per SNN is shown in fig 
9. Selecting less than 8 neurons degrades the performance of 
the proposed model while going beyond 10 neurons per SNN 
do not yield any significant improvement in accuracy 
keeping constant number of learning cycles mentioned 
above. 



VI. Conclusion 

This paper proposes a novel hybrid intelligent model 
based on the evolutionary fuzzy clustering with Minkowiski 
distance (EFC-MD) and syndicate neural network (SNN). A 
novel synergism of all three paradigms evolved a model 
which demonstrated superiority not only in benchmark 
problems but also in biometric dataset considered for real life 
applications. EFC-MD provide a optimal partitioning 
containing general shape cluster, whereby this patterns 
information is processed by SNN to yield competitive result 
of biometric applications. In biometric face recognition 
problems dealing with variations such as pose, illumination 
and expression, our method outperformed over the existing 
techniques. In this paper, we also reported the variation in 
number of clusters and MCM and observed that on 
increasing the number of clusters and MCM, the 
performance increases up to some extent then start 
decreasing. In most of the examples presented in this paper, 
we observed that best accuracy is achieved when number of 
members in a cluster (MCM) and number of clusters (C) are 
nearly comparable. 
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Abstract — This paper proposes an image watermarking scheme 
using wavelet tree quantization. The proposed approach embeds 
a watermark with visual recognizable patterns, such as binary or 
gray, in images by modifying the frequency part of the images. In 
the proposed approach, an original image is decomposed into 
wavelet coefficients. Then, image watermarking scheme based on 
the Simplified Significant Wavelet Tree (SSWT) is used to 
achieve the robustness of the watermarking. Unlike other 
watermarking techniques that use a single casting energy, SSWT 
adopts adaptive casting energy in different resolutions. The 
wavelet coefficients of the host image are grouped into wavelet 
trees and each watermark bit is embedded using trees. The trees 
are so quantized that they exhibit a large enough statistical 
difference, which will later be used for watermark extraction. 
Each watermark bit is embedded in all frequency bands, which 
renders the mark more resistant to attacks that remove certain 
frequency components. The performance of the proposed 
watermarking is robust to a variety of signal distortions, such as 
image cropping, adding noise, and filtering, and compression 
attacks. 

Keywords-Simplified Significant Wavelet Tree, Wavelet Tree, 
watermarking, DWT. 



I. 



Introduction 



Due to the open environment of Internet downloading, 
copyright protection introduces a new set of challenging 
problems regarding security and illegal distribution of privately 
owned images. One potential solution for declaring the 
ownership of the images is to use watermarks [1-3]. The 
Watermarking techniques apply minor modifications to the 
original data in a perceptually invisible or almost invisible 
manner with the modifications bearing the watermark 
information. By detecting the existence of these modifications, 
we can prove the ownership and even trace an illegal copy 
source [8]. 

Hsu and Wu [10] embedded the watermarks with visually 
recognizable patterns in the images. The embedding positions 
were selectively modifying the middle frequency of DCT of the 
images. The embedding and extracting methods of the DCT- 
based approaches have been described [10], [21], [22]. On the 
other hand, several methods [11-14], [16-20] used the discrete 
wavelet transform (DWT) to hide data to the frequency domain 
to provide extra robustness against attacks. Wang and Lin [9] 



proposed a wavelet tree quantization for copyright protection 
watermarking. The wavelet coefficients are grouped into a 
predefined structure called super tree. Watermark bits are also 
embedded by quantizing super tree and the resulting difference 
between quantized and unquantized trees will later be used for 
watermark extraction. One famous wavelet image/video 
coding, embedded zero tree wavelet (EZW) coding [15], has 
the potential to play an important role in upcoming 
image/video compression standards, such as JPEG2000 and 
MPEG4 due to its excellent performance in compression. 

In most previously proposed wavelet-based watermarking 
techniques the watermark is easily detected by employing 
detection theory. To over come this, the present paper, we 
propose a Watermarking based On Simplified Significant 
Wavelet Tree Quantization approach by adding visually 
recognizable images to the coefficients in all high frequency 
bands at all the level of the DWT of an image. In this process, 
we can identify the groups which are SSWT quantized during 
watermark insertion process. We then select the SSWT non- 
quantized groups and SSWT quantize them to remove the 
watermark. Our experimental results show that the proposed 
watermarking approaches is very robust to image compression 
and complicated image distortions. 

The present paper extends the study on wavelet trees and 
developed new concepts based on wavelet tree data structures 
to address the problem of (1) obtaining the best image quality 
for a given bit rate, and (2) to render the watermark more 
resistant to frequency based attacks, i.e., to achieve high 
robustness. This problem is important in many applications, 
particularly for progressive transmission, image browsing, 
multimedia applications and compatible transcoding in a digital 
hierarchy of multiple bit rates. It is also applicable to transmit 
over a noisy channel in the sense that the ordering of the bits in 
the order of importance leads naturally to prioritization for the 
purpose of layered protection schemes. 

The remaining sections of this paper are organized as 
follows. Basic concepts about DWT and Wavelet tree are 
described in Section 2. Section 3 describes the watermark 
embedding approach and the extraction method. In Section4, 
the experimental results are shown. The conclusions of our 
study are stated in Section 5. 
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B. Wavelet trees 



A. Wavelet theory and multiresolution analysis 

One of the oldest problems in statistics and signal 
processing is how to choose the size of an analysis window, 
block size, or record length of data so that statistics computed 
within that window provide good models of the signal behavior 
within that window. The choice of an analysis window 
involves in trading the ability to analyze "anamolies", or signal 
behavior that is more localized in the time or space domain and 
tends to be wide band in the frequency domain, from "trends", 
or signal behavior that is more localized in frequency but 
persists over a large number of lags in the time domain. To 
model data as being generated by random processes so that 
computed statistics become meaningful, stationary and argotic 
assumptions are usually required which tend to obscure the 
contribution of anomalies. 

The main contribution of wavelet theory and 
multiresolution analysis is that it provides an elegant 
framework in which both anomalies and trends can be analyzed 
on an equal footing. Wavelets provide a signal representation in 
which some of the coefficients represent long data lags 
corresponding to a narrow band, low frequency range, and 
some of the coefficients represent short data lags corresponding 
to a wide band, high frequency range. Using the concept of 
scale, data representing a continuous tradeoff between time (or 
space in the case of images) and frequency is available. 

In image processing, most of the image area typically 
represents spatial "trends", or areas of high statistical spatial 
correlation. However "anomalies", such as edges or object 
boundaries, take on a perceptual significance that is far greater 
than their numerical energy contribution to an image. 
Traditional transform coders, such as those using DCT, 
decompose images into a representation in which each 
coefficient corresponds to a fixed size spatial area and a fixed 
frequency bandwidth, where the bandwidth and spatial area are 
effectively the same for all coefficients in the representation. 
Edge information tends to disperse so that many non zero 
coefficients are required to represent edges with good fidelity. 
However, since the edges represent relatively insignificant 
energy with respect to the entire image, traditional transform 
coders, such as those using the DCT, have been fairly 
successful at medium and high bit rates. At extremely low bit 
rates, however, traditional transform coding techniques, such as 
JPEG, tend to allocate too many bits to the "trends", and have 
few bits left over to represent "anomalies", as a result, blocking 
artifacts often result. 

After an in depth study on the above the present thesis 
found that wavelet techniques show promising results at 
extremely low bit rates because trends, anomalies and 
information at all "scale", in between are available. A major 
difficulty is that fine detail coefficients representing possible 
anomalies constitute the largest number of coefficients. To 
overcome this, the proposed SSWT, of the present work made 
an effective use of the multiresolution representation in which 
much of the information is contained in representing the 
position of those few coefficients corresponding to significant 
anomalies. 



To improve the compression of significance maps of 
wavelet coefficients, two new data structures called SSWT is 
proposed in this paper. A parent child relationship can be 
defined between wavelet coefficients at different scales 
corresponding to the same location. Except the highest 
frequency subbands (i.e., HLj, LH b and HH ; ), every 
coefficient at a given scale can be related to a set of coefficients 
at the next finer scale of similar orientation. The coefficient at 
the coarse scale is called the parent, and all coefficients 
corresponding to the same spatial location at the next finer 
scale of similar orientation are called children. For a given 
parent, the set of all coefficients at all finer scales of similar 
orientation corresponding to the same location are called 
descendants. Similarly, for a given child, the set of coefficients 
at all coarser scales of similar orientation corresponding to the 
same location are called ancestors. The fourth level wavelet 
decomposition is shown in Figure 1. The parent child 
dependencies are shown in Figure 2. Note that in Figure 2 the 
arrow points from the subband of the parents to the subband of 
the children. The lowest frequency subband is the top left and 
the highest frequency subband is at the bottom right. In this 
section, coefficients with local information in the subbands are 
chosen as the target coefficients to be cast. The coefficients 
selection approach of the proposed SSWT is derived from 
EZW, and the basic definitions are given as follows. 

Definition 1: A wavelet coefficient x n (i,j) G D is a parent of 
x n-i(P,q), where D is a subband labeled HL n ,LH n ,HH n , 
p=i*2-l|i*2,q=j*2-l[j*2,n>l,i>l, and j>l. [6] represented the 
EZW algorithm for image compression using the zerotree of 
wavelet coefficients. The zerotree is defined as follows. Given 
an amplitude threshold value T, if a wavelet coefficient 
|x(i,j)|satisfies |x(i,j)| < T , then the |x(i,j)| is said to be 
insignificant over a given threshold value T. If a coefficient and 
all of its descendants are insignificant over T, then the set of 
these wavelet coefficients are called as zerotrees for the 
threshold value T. An element of a zerotree for threshold value 
T is a zerotree root if it is not the descendant of a previously 
found zerotree root for the threshold value T. The zerotree is 
based on the hypothesis that if a wavelet coefficient at a coarse 
scale is insignificant with respect to a given threshold value T, 
then all wavelet coefficients of the same orientation in the same 
spatial location at finer scales are likely to be insignificant with 
respect to T [6]. 

Definition 2: If a wavelet coefficient x n (i,j) at the coarsest scale 
and its descendents x n . k (p,q) satisfy |x n (i,j)| < T, |x n . k (p,q)| < T 
for a given threshold T, then they are called wavelet zero trees, 
where l<k<n. 

Definition 3: If a wavelet coefficient x n (i,j) at the coarsest scale 
satisfy |x n (i,j)| > T for a given threshold T, then x n (i,j) is called 
a significant coefficient [6]. 

Definition 4: If a wavelet coefficient x n (i,j) G D at the coarsest 
scale is a parent of x n _!(p,q), where D is a subband labeled HL n , 
LH n , HH n satisfy |x n (i,j)| > T,, |x n .,(p,q)| > T 2 for a given 
threshold T 1; T 2 then x n (i,j) and its children are called Qualified 
Significant Wavelet Tree (QSWT) . The present study based on 
the above definitions derived new definition on wavelet tree. 
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subband is scanned before any coefficient in the next finer 
subband in the proposed SSWT scheme. 



The Simplified Significant Wavelet Tree (SSWT) uses a DWT, 
which provide a compact multiresolution representation of the 
image. 

■ The SSWT coding provides a compact multiresolution 
representation of significant maps, which are binary maps 
indicating the positions of the significant coefficients. The 
proposed approaches allow the successful prediction of 
insignificant coefficients across scales to be efficiently 
represented as part of exponentially growing trees. 

■ Successive approximation is used in the SSWT which 
provides a compact multi precision representation of the 
significant coefficients and facilitates the embedding 
algorithm. 

■ The SSWT uses a prioritization protocol whereby the 
ordering of importance is determined, in order, by the 
precision, magnitude, scale and spatial location of the 
wavelet coefficients. Note in particular, the larger 
coefficients are deemed more important than smaller 
coefficients regardless of their scale. 



111. 



PROPOSED SSWT SCHEME 



Definition of proposed SSWT : If any wavelet coefficient 
x n (i,j) G D (other than finest scale wavelet coefficient or the 
leaf node ) which is a parent of some x.„-\(p,<l), where D is a 
subband labeled HL n , LH n , HH n , satisfy |x n (i,j)| > Tj or |x n _,(ij)| 
> T 2 or |x n . 2 (i,j)| > T 3 or |x n _ 3 (i,j)| > T 4 or |x n . k (i,j)j > T k for a 
given threshold T!,T 2 ,T 3 ,T 4 ....T k then x n (ij) and all of its 
children other than the finest scale wavelet coefficient or the 
leaf node are called Simplified Significant Wavelet Tree 
(SSWT). 

The host image of size n by n is transformed into wavelet 
coefficients using the L level DWT. With L level 
decomposition, one can have L><3+1 frequency bands. The 
proposed scheme is experimented with four levels as shown in 
Figure 1, when L = 4, the lowest frequency subband is located 
in the top left (i.e., the LL 4 subband), the highest frequency 
subband is at the bottom right (i.e., the HHi subband). The 
relationship between these frequency bands from the blocks of 
variable size can be seen as a parent child relationship. With 
the exception of the lowest frequency subband LL 4 , the parent 
child relationship can be connected between these sub nodes to 
form a wavelet tree. If the root consists of more than one node, 
then an image will have many wavelet trees as explained 
below. 

A wavelet tree descending from a coefficient in subband 
HH 4 of SSWT is shown in Figure 3. With the exception of the 
lowest frequency subband, all parents have four children. For 
the lowest frequency subband, the parent child relationship is 
defined such that each parent node has three children in the 
SSWT. In the proposed SSWT approaches the scanning of the 
coefficients is performed in such a way that no child node is 
scanned before its parent. For an N scale transform, the scan 
begins at the lowest frequency subband, denoted as LL N , and 
scans sub ands HL N , LH N and HH N , at which point it moves on 
to the scale N-l, etc. Each coefficient within a given coarser 

Identify applicable sponsor/s here, (sponsors) 



In the proposed approaches of SSWT a higher level 
subband (e.g., the HL 4 subband) is more significant than a 
lower level subband (e.g., the HL 2 subband). The proposed 
SSWT is not considering the LL 4 subband as a root to embed a 
watermark, since LL 4 is a low frequency band, which contains 
important information about an image. The coefficients are 
grouped according to wavelet trees except the coefficients of 
LL band (A4,4). Therefore the coefficients in subband A4,l, 
A4,2, A4,3 forms as roots of wavelet tree. By using a four level 
wavelet transform image of a 512x512, at the fourth level, the 
subbands, A4,l, A4,2, A4,3 have 32 2 coefficients, and there are 
total 3x32 2 = 3072 trees in SSWT. Each tree consists of 
1+4+16+64 = 85 coefficients as shown in Figure 2. The 
coefficients are in the order of parent to children. 

For an image of size N x N at the level two there are (N/4) 2 
coefficients and there are a total of 3 X (N/4) 2 trees in the 
SSWT. For an image of size N x N at the level four there will 
be (N/24) 2 coefficients and there are a total of 3 x (N/24) 2 trees. 
In the same way for a level L, for an image of size N x N, there 
are (N/24) 2 coefficients and there are a total of 3 x (N/24) 2 trees 
in the SSWT. For coefficients in the subbands of the same 
level, a novel raster scanning order is proposed in SSWT as 
shown in Figure 3. The j coefficient of a tree is denoted by 
x(j), l<j<85. 
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Figure 1 . Wavelet decomposition and its subbands 
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Figure2. Tree structure of wavelet coefficients and parent child relationship 
of SSWT 
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and the same process is repeated until the entire watermark bits 
are embedded. 

B. Watermark extraction process ofSSWT 

For extraction of the watermark the proposed method 
initially transform the watermarked image into four levels of 
DWT. Then, wavelets trees are created as explained above and 
rearrange them into 3072 trees. From these trees, based on the 
preprocessing method significant SSWT are identified and 
watermark bits are extracted until eight consecutive zeros in the 
7 LSB and eight consecutive ones in the 6th LSB are reached. 



Figure 3. The 85 wavelet coefficients of a four level wavelet tree for a original 

image of size 512 x 512 and ordering of coefficients from coarser scale to 

finer scale of SSWT 

A. Watermark insertion process of SSWT 

The present thesis adopted various preprocessing steps as 
described in the previous chapter for selecting significant 
subbands of SSWT. Preprocessing steps enhances the quality, 
better illumination, contrast and sharpening of image. By this 
confidentiality, quality, data integrity and robustness of the 
image are improved. The various preprocessing equations on 
mean, median, mode, variance and Standard Variation (SD) are 
given in the Equation 1 to 5 respectively. 



Z I PC' J) 



Mean = int 



Median = middlevalu e\ASC | V V P(i,j) 



Z-l 2-1 

Mode = mod value \ V V P(i, /') 
1 ;«o y-o 



(1) 



(2) 



(3) 



Variance - int 



ZZf(i,j) IZ(?(f,y)) 



(4) 



SD 



( = j = ( = / = o 



(5) 



The watermark bit is embedded according to the ordered 
coefficients. In this scheme the watermark bit is inserted in the 
6 LSB or 7 LSB if the coefficient of the pixel value is even 
or odd respectively. After embedding the watermark bits in the 
85 coefficients as explained above, the next subband is chosen 



IV. EXPERIMENTAL RESULTS 

The proposed SSWT scheme is experimented on the cover 
images Lena, Baboon, Peppers, Barbara, Monalisa, Lake, 
Cameraman and child of size 512x512, as shown in Figure 4. 
The Haar wavelet transform is used in the proposed scheme. 
The watermark considered for the experiments are logo SRRF 
GIET of size 32x32 as shown in Figure 5. In the proposed 
scheme the preprocessing step mean is applied as threshold. 
However any preprocessing method can be applied. The 
watermark is inserted in the selected locations by using the 
above method. 
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Figure 4. Original images (a) Lena (b) Baboon (c) Peppers (d) Barbara 
(e) Monalisa (f) Lake (e) Cameraman (h) Child 

SRRF 
GIET 



(a) 



Figure5. Watermark Image (a) Logo SRRF GIET 

Table 1 indicates the PSNR and NCC values for the proposed 
SSWT scheme. PSNR values of Table 1 ranges from 38 dB to 
39 dB for the considered 8 images for SSWT, which means the 
watermark is almost imperceptible. In Figure 6 shows the 
watermarked images for the proposed scheme. 
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Table II. PSNR and NCC results of various attacks on 
Lena image using SSWT scheme 



(e) (f) (g) (h) 

Figure 6. Watermarked images (a) Lena (b) Baboon (c) Peppers (d) Barbara 
(e) Monalisa (f) Lake (e) Cameraman (h) Child 



Table I. PSNR and NCC values for SSWT scheme 



S.No 


Image 


SSWT 


PSNR 


NCC 


1 


Lena 


38.26 


0.847 


2 


Baboon 


38.08 


0.889 


3 


Pepper 


38.28 


0.918 


4 


Barbara 


39.52 


0.967 


5 


Monalisa 


38.44 


0.946 


6 


Lake 


38.96 


0.895 


7 


Cameraman 


39.56 


0.848 


8 


Child 


38.52 


0.859 



The proposed SSWT scheme on Haar wavelets are also 
tested with various attacks such as JPEG compression with 
different ratios (90%, 80%, 70% and 60%), Gaussian noise 
with different ratios (10%, 15%, 20% and 25%), cropping with 
different ratios (5%, 10% ,15% and 20%) and Median filter 
with different size (2x2, 3^3, 4x4, 5x5), to test the robustness. 
Table 2 shows the PSNR and NCC values with various attacks 
on the considered images. From the Table 2, it is clearly 
evident that the proposed scheme is having a very good PSNR 
for all the images even after attacks. The experimental results 
demonstrate that the correlation coefficient's value is above 
0.7. The NCC value of Table 2 clearly indicates the quality of 
the watermark image is not degraded for all the attacks. The 
table also indicates the robustness is not degraded for the 
proposed scheme with attacks. 

Table 3 compares the PSNR values after inserting the 
watermark without attacks by the proposed SSWT scheme with 
various other existing methods [4, 5]. Table 2 clearly indicates 
the SSWT outperform the other existing methods. A graph is 
also plotted in Figure 7 which indicates the comparison of the 
proposed SSWT scheme with various other methods. 



Attacks 


SSWT 


PSNR 


NCC 


JPEG 
Compression 


90% 


37.25 


0.911 


80% 


34.46 


0.832 


70% 


30.64 


0.795 


60% 


29.81 


0.668 


Filtering 


2x2 


37.93 


0.913 


3x3 


37.44 


0.891 


4x4 


36.14 


0.799 


5x5 


34.9 


0.743 


Adding 

Gaussian 

Noise 


10% 


37.1 


0.932 


15% 


35.61 


0.879 


20% 


33.8 


0.775 


25% 


31.66 


0.711 


Cropping 


5% 


33.28 


0.865 


10% 


31.83 


0.812 


15% 


29.32 


0.765 


20% 


27.67 


0.624 



Table hi. Comparison of the proposed SSWT scheme 
with other methods 



Test 
images 


LIU Hui 
and HU Yu- 
ping method 


Prayoth 

Kumsawat 

et.al method 


Proposed 

SSWT 
method 


PSNR(dB) 


Lena 


38.20 


38.00 


38.36 


Baboon 


38.01 


37.70 


38.08 


Pepper 


38.11 


38.01 


38.28 


Barbara 


38.27 


38.16 


39.52 


Monalisa 


37.99 


37.89 


38.44 


Lake 


38.23 


38.09 


38.96 


Cameraman 


38.12 


38.12 


39.56 


Child 


38.25 


38.01 


38.52 
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Yu-ping method 
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method 

Proposed SSWT 

quantization 

approach 



Images 

Figure 7. Comparison of proposed SSWT scheme with other method 

V. Conclusion 



The present paper demonstrated a novel scheme called 
SSWT which is the extensions of zero wavelet trees. In the 
proposed schemes each watermark bit is embedded in various 
frequency bands and the information of the watermark bit is 
spread throughout large spatial regions. While the proposed 
watermarking schemes achieve high perceptual quality of the 
watermarked image for human eyes, it possesses high 
performance of robustness to various malicious manipulations 
including median filtering, low pass filtering, image rescaling, 
image cropping, JPEG, and JPEG2000 compression. Even the 
proposed scheme is implemented to provide that the value of 
NCC of the extracted watermark is as high as 0.9 while the 
watermarked image is attacked by the JPEG compression with 
a quality factor as low as 40%. In addition to copyright 
protection, the proposed watermarking schemes can also be 
applied to data hiding or image authentication. The proposed 
approaches are hierarchical and have multiresolution 
characteristics. In the proposed approaches, the embedded 
watermark is hard to detect by human visual perceptivity. The 
approaches match the upcoming image/video compression 
standards. 
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Abstract — This paper presents a novel method for isolated 
English word recognition based on transform method. This 
isolated word recognition method consists of two phases, feature 
extraction phase and recognition phase. In feature extraction, 
discrete Fourier transform and discrete cosine transform is used 
for extracting the features of speech samples and feature vector 
of different dimensions is obtained. In Recognition phase, 
Euclidean distance is calculated between test sample feature 
vector and all reference speech samples. Speech sample with 
minimum average distance is selected. For testing, 30 different 
words are used, spoken by male and female both, 25 utterances of 
each word are recorded. Results are compared with different 
feature vector dimensions. Experimental results showed that the 
maximum recognition rate of 94% is obtained. 

Keywords- Isolated word Recognition, Feature Extraction, Discrete 
Fourier Transform (DFT), Discrete Cosine transform (DCT), 
Euclidean distance. 

I. Introduction 

In this technological era, information technology continues 
making more impact on many aspects of our daily lives, 
however, the problems of communication between human 
beings and information processing machines become 
increasingly important. So far, such communication has been 
done almost entirely by means of keyboards and screens, but 
there are substantial disadvantages of this method for many 
applications. Speech is considered as the most widely used and 
natural means of communication between humans, and it is an 
obvious substitute for such means of keyboards and screens in 
the communication process. However, this deceptively simple 
means of exchanging information is, in fact, extremely 
complicated. Although the application of speech in the man- 
machine interface is growing rapidly, in their present forms 
machine capabilities for generating and interpreting speech are 
still incomplete and imperfect [1]. 

This research is concerned with speech recognition 
technology, which is part of speech and signal processing, as 
well as human computer interaction (HCI). Speech recognition 
is highly demanded and has many useful applications. This 
research simulates speech recognition technology used in 
telephone machine operators, when speech recognition 
technology is not currently commonly being used by 
traditional telephone operators. Through this research, a new 
simulation system or telephone operator is introduced that 



incorporates various components of Artificial Intelligence 
(AI), natural language processing, speech recognition 
technology and human computer interaction fundamentals [2]. 

The goal of speech recognition system is to map the acoustical 
signal to a string of words. The recognition of speech is of 
paramount importance in applications where Speech is the 
desirable input. As it allows natural interactions between man 
and machine without the use of a keyboard, this mode of input 
is increasingly gaining acceptance. The speech recognition 
systems acquire speech through a microphone and convert the 
speech to recognize the uttered text [3]. 

Speech recognition also defined as the technology by which 
sounds, words or phrases spoken by humans are converted into 
electrical signals, and these signals are transformed into 
coding patterns that can be identified by a computer. Based on 
this identification, the computer usually takes some actions. 
Speech recognition also refers to the ability of a machine or 
computer program to receive and interpret spoken commands 
and act upon those commands. An automated speech 
recognition system, using a microphone or a telephone as an 
input device, converts a person's speech into digital code by 
comparing the electrical patterns produced by the speaker's 
voice with a set of pre-recorded patterns stored in the database 
[4-5]. 

In recent years, automatic speech recognition has reached very 
high levels of performance, with word-error rates dropping by 
a factor of five in the past five years. This current state of 
performance is largely due to improvements in the algorithms 
and techniques that are used in this field. As a result, the 
accuracy level of Automated Speech Recognition (ASR) 
systems is improved especially when using a combination of 
various algorithms and feature extraction techniques [6]. 

There are various features extraction techniques, including 
Linear Predictive Coding (LPC), Perceptual Linear Prediction 
(PLP) and Mel- Frequency Cepstral Coefficient (MFCC) [9]. 
The most common operations required for front-end (features 
extraction) techniques [7]. Those operations include; sampling 
with a preferred sampling rate between 16000 to 22000 times 
a second for speech processing corresponding to 16 kHz to 22 
kHz. After the sampling, pre-emphasis is performed followed 
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by windowing, frequency warping, adding the first and second 
derivative coefficients to the static feature coefficients in order 
to enhance the performance of the speech recognition system, 
and lastly is the cepstral liftering, which is used to rescale the 
cepstral coefficients to have similar magnitudes. The most 
prevalent and dominant method used to extract spectral 
features is calculating Mel- Frequency Cepstral Coefficients 
(MFCC). MFCCs are one of the most popular feature 
extraction techniques used in speech recognition based on 
frequency domain using the Mel scale which is based on the 
human ear scale. MFCCs being considered as frequency 
domain features are much more accurate than time domain 
features [9]. 

This paper presents a Speech Recognition technique using 
transform method, in this method Fourier transform is used for 
feature extraction and Euclidean distance is used for feature 
matching. For speech recognition testing, thirty different 
computer related words spoken by different persons including 
male and female both are used. 



II. Speech Recognition process 

Speech recognition in computer domain involves various 
steps. The steps required to make computers perform speech 
recognition are: Voice recording, feature extraction, and 
recognition with the help of knowledge models. Feature 
extraction in automated speech recognition (ASR) systems is 
the computation of a sequence of feature vectors which 
provides a compact representation of the given speech signal. 
Feature training is a process of enrolling or registering a new 
speech sample of a distinct word to the identification system 
database by constructing a model of the word based on the 
features extracted from the word's speech samples. 

Feature matching/testing is a process of computing a matching 
score, which is the measure of Similarity of the features 
extracted from the unknown word and the stored word models 
in the Database. The unknown word is identified by having the 
minimum matching score in the database. 

The matching of an unknown word is performed by measuring 
the Euclidean distance between the feature vectors of the 
unknown word to the model of the known words in the 
Database. The word with the smallest average minimum 

distance is pick**^ nc chnwtl in thp £»niiQtinn h^loW 
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of frames. Each frame undergoes a sinusoidal transform (Fast 
Fourier Transform) in order to obtain certain parameters, 
which then undergoes filtering and decorrelation. The result is 
a sequence of feature vectors describing useful logarithmically 
compressed amplitude and simplified frequency information . 



d(x,y) = JXO - >'0 3 



(1) 

Where xi is i th input features vector, yi is i th features vector in 
the database, and d is the distance between xi and yi. 



III. Speech Recognition Using Transform Method 

This method involves a frame-based analysis of a speech 
signal where the speech signal is broken down into a sequence 



The following steps are used for feature extraction for 
transform method of speech recognition. 

• Frame Blocking: Framing focuses on the process of 
segmenting the speech sample obtained from the analog 
to digital conversion into small frames with time length in 
the range of 10ms to 40ms. In this step the continuous 
speech signal is blocked into frames of N samples, with 
adjacent frames being separated by M {M < N). The first 
frame consists of the first N samples. The second frame 
begins M samples after the first frame, and overlaps it by 
N - M samples. Similarly, the third frame begins 2M 
samples after the first frame (or M samples after the 
second frame) and overlaps it by AT - 2M samples. This 
process continues until all the speech is accounted for 
within one or more frames. The goal of the overlapping 
scheme is to smooth the transition from frame to frame. 
Typical values for N and M aie N = 256 (which is 
equivalent to ~ 16 msec windowing and facilitate the fast 
radix-2 FFT) and M=100. Fig. 1 shows the segmented 
speech sample with frame size 256 samples [10]. 
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Figure 1 Segmented Speech Signal (Frame Size = 256 samples) 

• Windowing: The next step in the processing is to window 
each individual frame so as to minimize the signal 
discontinuities at the beginning and end of each frame. 
The concept here is to minimize the spectral distortion by 
using the window to taper the signal to zero at the 
beginning and end of each frame. Windowing is very 
necessary to work with short term or frames of the speech 
signal. This is to select a portion of the speech signal that 
can be reasonably assumed to be stationary speech signal. 
It is performed in order to avoid any unnatural 
discontinuities in the speech segment and distortion in the 
underlying spectrum, in order to ensure that all parts of 
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the speech signal are recovered and possible gaps between 
frames are eliminated [11]. 
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Discrete Cosine Transform (DCT): In this final step, 
spectrum is converted back to time. The result is called 
the cepstrum coefficients. The cepstral representation of 
the speech spectrum provides a good representation of the 
local spectral properties of the signal for the given frame 
analysis. Because the Mel spectrum coefficients are real 
numbers, we can convert them to the time domain using 
the Discrete Cosine Transform (DCT). The formula is, 



Figure 2 Windowed Speech Segment 

The most commonly used window shape is the hamming 
window; Fig 2. Shows the windowed speech sample. Equation 

nf Hammino ^x/inHr\^x/ ic 

w(k) = 0.54-0.46cos| |, 0<n<N-l 



N-l 



(2) 



Fast Fourier Transform (FFT): Fourier series enable a 
periodic function to be represented as a sum of sinusoids 
and converts a speech signal from the time domain to the 
frequency domain. Fast Fourier Transform, which 
converts each frame of N samples from the time domain 
into the frequency domain. The FFT is a fast algorithm to 
implement the Discrete Fourier Transform (DFT) which is 
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(3) 
The result after this step is often referred to as Power 
spectrum or periodogram. Fig 3 shows the power spectrum 
of word 'One', this diagram shows the output of FFT 
transform on frequency and time scale. The red color 
shows the most of the energy present between 0.8 and 1.2 
sec. 
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Figure 3 Power Spectrum of word 'ONE' 



(4) 
By applying the procedure described above, for each speech 
frame of around 16msec with overlap, a set of cepstrum 
coefficients is computed. These are result of a cosine 
transform of the logarithm of the short-term power spectrum. 
This set of coefficients is called a feature vector. Feature 
Vector of 8, 16 and 20 dimension is computed [12]. 



IV. Implementation Steps for Feature Extraction, 
Training and Matching 

1 . The speech signal is stored in the form of wave files and is 
read. 

2. The speech signal is blocked into frames of N samples, with 
adjacent frames being separated by M samples. Frame 
blocking is carried out to reduce the mean squared prediction 
error over a short segment of speech wave form. In our case, N 
= 256 and M = 100. 

3. All the frames are stored in matrix Ml. 

4. Hamming window is applied on each individual frame so as 
to minimize the signal discontinuities at the beginning and end 
of each frame. 

5. Matrix Ml is transformed into new matrix M2, where the 
column vectors of M2 are the original frame vectors 
transformed by the hamming window. 

6. Discrete Fourier Transform is applied on matrix M2, and 
M2 is transformed in new matrix M3 where column vectors of 
M3 are DFTs of the column vectors of M2. 

7. Discrete cosine transform is applied on DFT output to 
convert back into time domain, and stored in a new matrix and 
used as a feature vector matrix for feature matching phase. 

8. For feature training, feature vectors of all the reference 
speech samples are calculated using transform method. 

9. For feature matching unknown word is selected from 
database as test sample. 

10. Feature vector is calculated for this test sample. 
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1 1 . The Euclidean distance is calculated between test samples 
and all reference speech samples. 

12. Speech sample with minimum average distance is selected. 
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Table II. Matching Rate (%) for 15 test samples of each word, 

feature vector size - 16 



VI. 



Results 



Database of speech samples is prepared; the speech samples 
used in this project are recorded using Windows Sound 
Recorder 2010, 9.0.1. For each word, twenty five utterances 
from different speakers are collected, samples are taken from 
each speaker in two sessions so that training model and testing 
data can be created. Ten, five, three samples from each word 
are used for training phase and remaining fifteen, ten are used 
for testing phase. Table I shows the database description for 
speech recognition. The samples are collected from eight 
different speakers, including male and female both so that 
speaker independent speech recognition can be done. 



, Table I. Dai 

Parameter 


abase Description , 

Sample 

characteristics 


Language 


English 


No. of Speakers 


10 (5 Male, 5 Female) 


Sampling frequency, 
quantization 


16000 Hz, 16 bits 


Average duration of 
training and testing utterance 


1-2 sec 


Total number of words 


30 


Number of sample 
utterances per word 


25 


Total number of 
utterances in database. 


30*25 = 750. 



The performance of speech is evaluated in terms of 
recognition rate, the following recognition measure for 
computing the recognition rate, 

(Number of successful detection of word) 

Recognition rate = (5) 

(Number of words in testing set) 

Table II shows the matching rate of 15 test samples for 05, 10 
reference samples of each word, for feature vector dimension 
16. Here matching rate is calculated out of 15 samples for each 
word using equation (5). Total 450 test samples are used and 
each test sample is compared with 150 and 300 different 





Matching Rate (%) 


WORD 


05 Reference 
Samples 


10 Reference 
Samples 


ONE 


93.33 


93.33 


TWO 


73.33 


73.33 


THREE 


100 


100 


FOUR 


60 


60 


FIVE 


93.33 


93.33 


SIX 


100 


100 


SEVEN 


100 


100 


EIGHT 


100 


100 


NINE 


100 


100 


TEN 


100 


100 


EXCELLENT 


100 


100 


WALK 


100 


100 


GOOD 


100 


100 


GO 


73.33 


93.33 


LIFT 


100 


100 


UP 


93.33 


73.33 


DOWN 


80 


100 


THROW 


66.66 


100 


STOP 


73.33 


80 


CATCH 


100 


100 


WELCOME 


100 


100 


ENTER 


100 


100 


AMAZING 


66.66 


100 


BEAUTIFUL 


100 


100 


SHUTDOWN 


80 


80 


SHIFT 


100 


100 


PRINT 


100 


100 


COPY 


100 


100 


PASTE 


100 


100 


EXIT 


100 


100 


Total Matches out of 450 


413 


427 


Avg Matching Rate (%) 


91.77 


94.88 
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Table III. Comparison of Matching (%) of 450 test samples for different feature vector size 





Feature Vector Dimension 


No of Reference 

Samples for each 

word 


08 


16 


20 


Matches 


% 


Matches 


% 
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Figure 4 Feature vector Dimension Vs Matching Rate (%) of 450 test samples. 



Speech samples. In this case also for most of the words 100% 
matching rate is obtained. Maximum matching rate is 
observed, when ten reference samples of each word are used. 
Shaded cells shows the maximum matching rate of 90% and 
above, for 5 reference samples, maximum average matching 
rate of 91.77% and for 10 reference samples, maximum 
average matching rate of 94.88% is observed. Table III shows 
the comparison of Matching (%) out of 450 test samples for 
different no of reference samples for each word, for feature 
vector dimension 8, 16 and 20. For five reference samples for 
each word, the matching accuracy is 90%, 91% and 89% for 8, 
16 and 20 feature vector dimension respectively. Similarly for 
ten reference samples for each word, the matching accuracy is 
92%, 94% and 90% for 8, 16 and 20 feature vector dimensions 
respectively. In both the cases, maximum matching accuracy 
is obtained in 16 feature vector dimension. Fig 4 shows the 
same results in graphical format. 



VII. 



Conclusion 



In this paper, Speaker independent isolated word recognition 
technique using Transform method is presented. The results 
were found to be satisfactory of vocabulary of English words. 
In Transform method, speech samples are converted into 
frequency domain using fast Fourier transform and maximum 
accuracy of 94% is obtained. 



Different recognition rate is obtained for all words, because of 
different phonemes are used for different words. The 
maximum recognition rate is obtained for transform method 
for words like, Two, Three, Walk, Excellent etc. The words 
'Go', 'Throw' have the same vowel part and differ only in 
their unvoiced beginnings and endings, so these two words are 
mostly misinterpreted with each other. 
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Abstract-The mobile and wireless industry is entering an 
exciting time. Demand for mobile technology is growing at a 
tremendous rate. Corporations are deploying mobile applications 
that provide substantial business benefits, and consumers are 
readily adopting mobile data applications. 

We present scientific application for mobile phone in steps of 
software engineering project starting from data gathering, data 
analysis, designing, coding, packaging, testing and deploying, 
Mobile Scientific Calculator (MSC) enable user to compute any 
mathematical operation by using this application in mobile phone 
without needing to use the calculator. Scientific calculator offers 
three keys the four mathematic operations, the four systems of 
digits and offering many of functions such as angles functions, 
power, factorial and other functions. Scientific calculator is 
suitable for many mobile phones which don't have scientific 
calculator in its applications, it provide simple design for dealing 
with its functions for all users. It operated on more than one 
mobile phone model. 

Keywords - mobile application, scientific calculator. 

I. INTRODUCTION 

Scientific calculator is an important and necessary 
application for all student or any person work in scientific 
work field, more operations are difficult to computed it by 
using normal calculator which is available in all mobile phone 
devices and designed for computing simple mathematical 
operations such as addition, subtraction, multiplication and 
division, so needing for scientific calculator increase when 
dealing with mathematical computation operations with pure 
mathematical values. 

More of mathematic operations that the student needing it 
the scientific calculator offer it for student such as four 
numeric systems, mathematic operations, factorial, mod, 
power, square, etc. all of these functions and operations the 
mobile scientific calculator executes it in any type of the four 
numeric systems, this facilitates us the operation of conversion 
digits to decimal system for solving and reconversion the 
result to the specific system. 

Scientific calculator is not available in all mobile models, 
so MSC can install in mobile which do not have scientific 
calculator such as Nokia models or Samsung models. 

II. RELATED WORKS 

Some related works used the scientific calculator Jairus P. 
Ochanda and Francis C. Indoshi [1] which show benefits of 



using scientific calculator in teaching and learning in 
secondary school education. Other related works showed the 
effectiveness of using calculator in classroom for computing 
certain results Christina L. Sheets [2], but this calculator was 
not scientific calculator. Helmut Dersch [3] designed a 
symbolic calculator written for mobile phones and PDAs. It 
solves and manipulates equations, handles basic calculus 
problems, and provides a few more typical functions of 
computer algebra systems with no support of the forth numeric 
systems. 

Also there is another related work for using scientific 
calculator in mobile such as Xici Wang [4] which designed a 
portable digital laboratory. It collected data from a sensor, and 
send data to a computer or a Graphing Calculator (GC), with 
the Data Streamer software, this work using scientific 
calculator in mobile with computer availability. 

III. DATA GATHERING STEP 

In this level information about designing user's interfaces , 
converting algorithms among numeric systems and other 
mathematic function algorithms. 

IV. DATA ANALYSIS STEP 

After gathering information and requirements in the 
previous step the analysis was started which showed that a 
necessary requirements for scientific calculator application 
were designing simple interfaces for user including buttons for 
input mathematic operations, special buttons for input 
numeric system and other buttons for deleting with monitor 
for digits representation. 

MSC is supported the following operations : 

• Mathematic operations (sum, sub, mult, div). 

• Factorial. 

• Power. 

• Mod. 

• Square. 

• Pi, which equal to 3.14. 

• Other functions (sin, cos, tan). 

In addition to the above operations there are alerts messages 
when error occur such as division by zero, out of range and 
power to real number. 
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V. DESIGNING STEP 

In this step the design of program was achieved according 
to the data flow diagram as following : 
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Figure (2) Second Level Mobile Scientific Calculator DFD 
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The interface of (MSC) was designing by using graphics 
(canvas) as show in the following : 
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Figure (5) MSC Interface 



VI. CODING STEP 

The source code of MSC was written by using J2ME. 

VII. Compilation and precertification step 

This step was completed after coding step with assurance of 
completing all source code steps. The compilation was 
achieved on Connected Limited Device Configuration (CLDC 
1.0 &CLDC 1.1) platform with Mobile Information Device 
Profile (MIDP 2.0). 



Figure (3) Third Level Mobile Scientific Calculator 
DFD 
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VIII. PACKAGING STEP 

The execution of this step was achieved after executing of 
code in developing environment, then all packages and 
necessary files of mobile were packaged as well as the special 
information of application together in one package. 

IX. TESTING STEP 

The application was tested on developing environment as 
following : 
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Figure (6) Mobile Scientific Calculator Interfaces 
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X. DEPLOYING STEP 

After testing the application in emulator the application 
was deployed on mobiles with different models. The 
deployment of MSC was achieved on MIDP 2.0 with CLDC 
1.1. 

XI. SCIENTIFIC CALCULATOR EVALUATION 

We performed a simple study for ( 5 1 ) students in 1 st, 2 
nd and 4 th classes of software engineering department in 
February of 2012 about the using of scientific calculator in 
their exam, the total number of students was (51) students 
which contained (21) students in 1 st calss, (10) students in 2 
nd class and (20) students in 4 th classes the results of 
evaluation are shown in the following figure : 
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Figure (7) Scientific Calculator 
Evaluation 



XII. CONCLUSIONS 

The MSC help every student for performing many 
operations. Every student who have mobile will do not need to 
carry scientific calculator after installing the MSC on (his or 
her) mobile. So the MSC reduces devices that the student must 
carried it such as scientific calculator specifically. 
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Abstract — Hybrid dimension association rules mining algorithm 
satisfies the definite condition on the basis of multidimensional 
transaction database. Boolean Matrix based approach has been 
employed to generate frequent item sets in multidimensional 
transaction databases. When using this algorithm first time, it 
scans the database once and will generate the association rules. 
Apriori property is used in algorithm to prune the item sets. It is 
not necessary to scan the database again; it uses Boolean logical 
operations to generate the association rules. 

Keywords — Association Rule, Hybrid dimensional association 
rule, relational calculus, multidimensional transaction database. 

I. Introduction 

For mining association rule in transactional or relational 
database in data mining till now we have used different 
approaches. Apriori algorithm is costly to handle a huge 
number of candidate sets and it requires multiple scans for the 
database which is a tedious job. However, in situations with a 
large number of frequent patterns, long patterns, or quite low 
minimum support thresholds, an Apriori-like algorithm may 
suffer from some above problems and it is used for only single 
dimensional mining. Although an FP-tree is rather compact, 
its construction needs two scans of a transaction database, 
which may represent a nontrivial overhead [3]. 

Finding frequent patterns plays an important role in data 
mining and knowledge discovery techniques. Association rule 
describes correlation between data items in large databases or 
datasets. The first and foremost algorithm to find frequent 
pattern was presented by R. Agrawal et al. in 1993. Presented, 
frequent pattern tree approach, for mining association rules 
without candidate generation. The candidate generation and 
test methodology, called Apriori techniques was the first 
technique to compute frequent patterns based on the Apriori 
principle and anti-monotone property. The Apriori technique 
finds the frequent pattern of length k from the set of already 
generated candidate patterns of length k-1. This algorithm 
requires multiple database scans and large amount of memory 
to handle the candidate patterns when the number of potential 
frequent pattern is reasonably large. In the past two decades, 
large numbers of research studies have been published 
presenting new algorithms or extending existing algorithms to 
solve frequent pattern mining problem more effectively and 



efficiently. But all the above-mentioned studies are well 
suitable for single-dimensional transactional databases. 

II. Association Rules 

Definition 1: Let I = {il, i2, i3,... in} be a set of items. D is 
a database of transactions. Each transaction T is a set of items 
and has an identifier called TID. Each Tcz I, 
Definition 2: Association rule is the implication of the form 
A =>B, where A and B are item sets which satisfies Ac I , Be: 
I and AH B = cp . 

Definition 3: The strength of an association rule can be 
measured in terms of its Support and Confidence. 

The support supp(X) of an item set X is defined as the 
proportion of transactions in the data set which contain the 
item set. 

The confidence of a rule is defined 
Conf(X=> Y) = supp(Xu Y)/supp(X). 

Definition 4: Boolean Matrix: is a matrix with element '0' 
or'l'. 

Definition5 : The Boolean AND operation is defined as 
follows: 0.0=0 0.1=0 1.0=0 1.1=1. Where logical implication 
is denoted by '.' or AND 

There are different methods for generating frequent item sets 
and association rule mining. 
Some of them are as follows:- 

A. Apriori Algorithm 

The classical Apriori algorithm employs an iterative 
method to find all the frequent item-sets. First, the frequent 1- 
item sets L[ is found according to the user-specified minimum 
support threshold, and then the L[ is used to find frequent 2- 
iemsets L 2 , and so on, until there is no new frequent item sets 
could be found. After finding all the frequent item sets using 
Apriori, we could generate the corresponding association rules 
[5]. Apriori employs an iterative approach known as a level- 
wise search, where k-item sets are used to explore (fc+l)-item 
sets. Apriori principle: If an item set is frequent, then all of its 
subsets must also be frequent. It works in two steps-Join Step: 
Ck is generated by joining L^with itself. Prune Step: Any (k- 
l)-item set that is not frequent cannot be a subset of a frequent 
k-item set. 

Apriori Algorithm is the simple Single-dimensional mining 
algorithm. 



81 



http://sites.google.com/site/ijcsis/ 
ISSN 1947-5500 



(IJCSIS) International Journal of Computer Science and Information Security, 
Vol. 10, No. 2, February 2012 



B. Sampling Algorithm 

The main idea for the sampling algorithm is to select small 
sample one that fits in the main memory of the database of 
transactions and to determine the frequent item sets from that 
sample. If those frequent item sets form a superset of frequent 
item sets for the entire database, then we can determine the 
real frequent item sets by scanning the remainder of the 
database in order to compute exact support values for the 
superset item sets. A superset of frequent item sets can usually 
be found from by using for eg.Apriori algorithm with a 
lowered minimum support. 

C. Partition Algorithm 

In this algorithm if we are given a database with a small 
number of potential large item sets say a few thousands, then 
support for them can be tested in one scan by using a 
partitioning technique. Partitioning divides the database into 
non-overlapping subsets; these are individually considered as 
separate databases and all large item sets for that partition 
called local frequent item sets, are generated in one pass. The 
Apriori algorithm can then be used efficiently on each 
partition if it fits entirely in main memory. Partitions are 
chosen in such a way that each partition can be accommodated 
in main memory. 

D. FP-growth algorithm 

FP-growth algorithm is an efficient method of mining all 
frequent item sets without candidate's generation. The 
algorithm mine the frequent item sets by using a divide-and- 
conquer strategy as follows: FP-growth first compresses the 
database representing frequent item set into a frequent -pattern 
tree, or FP-tree, which retains the item set association 
information as well. The next step is to divide a compressed 
database into set of conditional databases (a special kind of 
projected database), each associated with one frequent item. 
Finally, mine each such database separately. Particularly, the 
construction of FP-tree and the mining of FP-tree are the main 
steps in FP-growth algorithm. 

In reality, for example, along with items purchased in sales 
transactional databases, other related information like quantity 
purchased, price, branch location etc are stored. Additional 
related information regarding the customers who purchased 
the items, such as customer age, occupation, credit rating, 
income, and address also stored in the database. Frequent item 
sets along with other relevant information will be helpful in 
high-level decision-making. This leads to the challenging 
mining task of multilevel and multidimensional association 
rule mining. In recent years, there has been lot of interest in 
mining databases with multidimensional data values. 

III. Conditional -Hybrid dimensional association Rule 

Mining 

Thus here I present mining conditional hybrid-dimensional 
association rules. Based on these marking, either it does intra- 
dimensional join or inter-dimensional join. 

To solve these problems for founding frequent item sets we 
have proposed this algorithm. It mines hybrid dimension 



Association rules not only from single -dimensional as well 
as multidimensional database. It meets the definite condition 
to generate conditional hybrid dimensional association rules, 
from multidimensional transactional database. It scans 
database only once which makes easy to find large frequent 
patterns. It does not generate the candidate item sets as we 
generate in Apriori algorithm, rather it uses Boolean vector 
"relational calculus" to generate frequent item sets. I take 
multidimensional datasets with five attribute as input and 
apply on Hybrid-dimensional association algorithm rule to 
generate association rule using Boolean matrix. I use backend 
sql server and front end jdkl.5 version. 

Methodology used in this project:- 

• Transforming the multidimensional transaction database 
into two Boolean matrices one for subordinate attributes 
(Am*p) and one for main attribute (Am*q). 

• Generating the set of frequent 1-itemset L A1 (from the 
subordinate attributes matrix) and L B1 (from the main 
attribute matrix). 

• Pruning the Boolean matrices. 

• Perform AND operations to generate 2-itemsets: 

Lai t*<l Lbi and L A1 >< L A i for inter-dimension join and 
LbiM L B i for intra-dimension join. 

• Repeat the process to generate (k+l)-item sets from Lk. 
Transforming the multidimensional transaction database into 
Boolean matrix 

Generating the frequent 1 -itemset L t 

Pruning the Boolean matrix 

Generating the set of frequent k-item sets L k 

The generation of frequent item sets is the core of all the 
association rules mining algorithms. Previous studies on 
mining multi-dimensional association rules we focused on 
finding non-repetitive predicate multi- dimensional rules. We 
integrate the single -dimensional mining and no repetitive 
predicate multi-dimensional mining, and present a method for 
mining hybrid-dimensional association rules using Boolean 
Matrix. 

A. The join process 

There are two steps in generation of the frequent item sets 
and frequent predicate sets. The two steps are joining and 
pruning. 

(1) The join generating candidate 2-itemsets C2; we find 
frequent 1-itemsest based on each attribute, at the same time 
we mark items belong to every main attribute. So it will be 
clear that the marked items are the items of main attribute and 
unmarked items are the subordinate items. When we search 
for C2, if both of the two joining items are marked items, we 
call the function for intra-dimensional join between the items 
as well as inter-dimensional join, but only proceed with inter- 
dimensional join on the other occasions. 

(2) The join on other occasions when we generate frequent 
item sets directly according to the join mode of the Apriori, it 
would occur intradimensional join as well as inter- 
dimensional join. But there are some restrictions to the 
generation of intradimensional join and inter-dimensional join. 
Therefore we make the following modifications to the joining 
step of the Apriori. We assume that items within transaction 
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and item-set are sorted in lexicographic order. We could take 
two steps to find L k 

• Distinguish the intra-dimensional join and inter- 
dimensional join; If all the items within the two (k-1) item- 
sets belong to the main attribute; we proceed with intra- 
dimensional join, and proceed with inter-dimensional join on 
other occasions. 

• Implement join L k .j x L k .j, and choose the 
corresponding joining condition according to the characteristic 
of the join (intra-dimensional join or inter-dimensional join) 

B. The conditional restriction in hybrid-dimension 

Association rules 

First the frequent item-sets are obtained, and then we 
generate the hybrid-dimension association rules from the 
frequent item-sets. In the process of generating frequent item- 
sets, we make both intra-dimensional join and inter- 
dimensional join, as well as the conditional restrictions while 
proceeding with join, all of the frequent item-sets have such a 
character: the values within main attribute field occur many 
times, while the values within subordinate attribute fields 
occur only once. Thus, the rules generated by the algorithm 
may include many predicates, or include the same predicate. 
So the hybrid dimension association rules are formed [1]. 

IV. ALGORITHM 

The algorithm consists of following steps: 

1. Transforming the multidimensional transaction database 
into two Boolean matrices one for subordinate attributes 
(Am*p) and one for main attribute (Am*q). 

2. Generating the set of frequent 1-itemset L A i (from the 
subordinate attributes matrix) and L B i (from the main attribute 
matrix). 

3. Pruning the Boolean matrices. 

4. Perform AND operations to generate 2-itemsets: 
L A1 join L B1 and L A1 join L A1 for inter-dimension join 
And L B Join L B , for intra-dimension join. 

5. Repeat the process to generate (k+l)-item-sets from L k . 

• Transforming the multidimensional transaction database 

into Boolean matrix 

• Generating the frequent 1-itemset L t 

• Pruning the Boolean matrix 

• Generating the set of frequent k-item sets L k 

We integrate the single-dimensional mining and no 
repetitive predicate multi- dimensional mining, and present a 
method for mining hybrid- dimensional association rules 
using Boolean Matrix. Let a multi-dimensional transaction 
database Order, which includes two subordinate attributes Age 
and Income and one main attribute rdered_items as given in 
table I. In order to simplify the implement process, we pre- 
processed some attributes before algorithm executes, shown 
below in table II and table III. 

The multidimensional transaction table Order is 
transformed into two Boolean Matrices: Am*p as subordinate 
attributes matrix and Bm*q as main attribute matrix. Which 
are as given below: Let the minimum support is 0.4; m=10 is 
the number of transactions. 



TABLE I 
ORDER 



ID 


•vjc 


Income 


Ordered items 


1 


J 1.40 


(V7K0 


N.12.15 


2 


3 1.40 


TWO 


11.12 


3 


3 L40 


9500 


11.12. li 


- 


21,30 


4*50 


12.14 


5 


41.30 


7700 


11.1.1 


6 


>!..4<i 


ftrsSO 


11.12. 14 


7 


31,, 50 


3500 


11,13.15 


x 


21.. 30 


-WOO 


12.15 


*> 


2 1.30 


3950 


11. 12. 13 


ID 


21.40 


5400 


13.14 



TABLE III 
MAPPING AGE 



TABLE III 
MAPPING INCOME 



liiri.-n.il 


V.ii..- 


21.30 


y 


31. .40 


m 


flJO 


1 



Interval 


Nime 


4OMHSOO0 


1 


6000-10,000 


h 



TABLE IIIV 
ORDER SETS 



ID 


Age 


Inconte 


Ordered items 


1 


M 


Tl 


T 1 . Tl tf 


5 


M 


II 


11.12 


3 


M 


II 


11.12.15 


4 


V 


I, 


12.14 


5 


s 


tl 


11. 13 


6 


M 


II 


11,12. W 


n 


M 


I. 


II, DL 15 


X 


V 


I. 


12.15 


9 


V 


L 


11, 12. 13 


10 


Y 


L 


13.14 



Therefore min_sup_num=10. We compute the sum of the 
elements value of each column in the Boolean matrix Ai *s 
and Bio*5 set of frequent 1-itemset is: 

Lai = {{y},{m},{h},{l}}, L B1 ={ {II },{12},{13},{15} } smaller 
than the minimum support number [7]. Now we perform the 
'AND' operation to join L A i and L B1 (according to the type of 
join) to generate L 2 . The possible 2-itemsets are: Inter- 
dimensional join (L A1 x L B , and L A1 x L A1 ): It is 
performed by AND operation among the columns of Matrix 
A m * p AND B m * q and A m * p AND A m * p . Intra-dimensional join 



(L B 



L B1 ): It is performed by AND operation among the 



columns of Matrix B m » p AND B m * q The possible 2- 
itemsetsfromL A1 andL B1 are:(y,l),(m,h),(h,l),(h,2),(h,3),(h,5),(l, 
l),(l,2),(l,3),(l,5),(y,l),(y,2),(y,3),(y,5),(m,l),(m,2),(m,3),(m,5) 
,(Il,I2),(Il,I3),(Il,I5),(I2,I3),(I2,I5),(I3,I5).After performing 
'AND' operation to get the support numbers of these 
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mentioned item sets the Boolean matrices Ai *is and Bi *6 are 
generated. Now again we compute the sum of the columns of 
matricesAio»i8 and Bi *6- And prune the columns of the 2- 
itemsets those are not frequent. Same process will be repeated 
till for next higher item sets. 



Age Income 



Onieredjte ms 
II 12 13 15 






We can generate such a hybrid-dimension association rule: 
mnhn Il=>12 (Support=40% and Confidence^ 00%) 

V. EXPERIMENT 

To test whether the proposed method is fast, expansible 
And effective our experiments are made on machine with 
Intel(R) Core 2Duo, 1.5GHz and 1GB memory. The 
Operating system is Windows XP. We use a database that has 
500 records and 13 attributes, which have 2-8 different value. 
Time value for execution is given in millisecond. 



Execution Time with different record numbers 


100,000 ■ 


f -11 


75,000 ■ 
aj 

3 


■-lalll II 


g 50,000 ■ 


Ifll 


25,000 ■ 


ll II II II II 


0- 


1 WW WW WW WW WW J 




\&- itft ■JiV 3 fei ep\ 




Category 




■ 60 "70 



VI. RESULT AND DISCUSSION 

The confidence of association rules has a specific meaning: 
When the antecedent of the rule is satisfied, the consequent of 
the rule will have c% (here c refers to the confidence of the 
rule) possibility of being satisfied. In association rules, only in 
the antecedent part of multidimensional association rules 
include several predictions at the same time. We can say that 
the result of prediction on multidimensional association rules 
is better and more precise than on single dimensional 



Associatio rules. For example, TABLEI presents a 
multidimensional transaction database Order. If we make a 
single dimensional association analysis on the predicate 
Ordered _items, which presents itemsets.A in transaction, the 
result of analysis will only include the relevance of Order 
itemsets.A. But, if we make a hybrid dimension association 
analysis, the result of analyzing not only includes the 
relevance of Order itemsets.A, but also includes the relevance 
of customers' information, e.g.: Age, Income. Thus when we 
proceed with predictions on the product order of customers' 
by means of the result of association analysis, obviously, the 
conditions included in the antecedent of multidimensional 
association rules is more abundant, and will bring better 
prediction result. 

VII. Conclusion 

The proposed algorithm uses input datasets and meets the 
definite condition to generate conditional-hybrid dimensional 
association rules, from multidimensional transactional 
database. The main features are: it scans the database only 
once, it does not generate the candidate item sets, and it uses 
the "relational calculus" to generate frequent item sets. It 
stores data in the form of bits, so it needs less memory space 
and can be applied to large relational databases. 
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Abstract: 

Data Envelopment Analysis (DEA) is a non- 
parametrical method for evaluating the efficiency of 
Decision Making Units (DMU) using mathematical 
programming. There are several methods for 
analyzing the efficiency of Decision Making Units, 
among which are Charnes Cooper Rodes (CCR) and 
Banker Charnes Cooper (BCC), which compute the 
efficiency of Decision Making Units using the linear 
programming or Wang's method which evaluates the 
efficiency of Decision Making Units using Ideal 
Decision Making Unit (IDMU) and Anti-Ideal 
Decision Making Unit (ADMU) and ultimately 
performs the ranking of units. All these calculations 
occur when all data, that is the inputs and the outputs 
of Decision Making Units, are positive and crisp data. 
Now this question arises: if the data are Symmetric 
Triangular Fuzzy Number, how will be the method of 
computing the efficiency of Decision Making Units? 
In thisarticle, wewill introducea method for 
evaluating the efficiency of Decision Making Units 
and also rank them using Ideal and Anti-ideal 
Decision-Making Units with FuzzyData. Finally, a 
numerical example is proposed to display the 
application of this method. 

Key words: 

Data Envelopment Analysis (DEA), Ideal Decision 
Making Unit (IDMU), Anti Ideal Decision Making 
Unit (ADMU), relative closeness (RC), Ranking 
1. Introduction: 

Data Envelopment Analysis (DEA), developed by 
Charnes, Cooper [1], usually evaluates decision 
making units (DMUs) from the angle of the best 
possible relative efficiency. If a DMU is evaluated to 
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have the best possible relative efficiency of unity, 
then it is said to be DEA efficient;otherwise it is said 
to be DEAinefficient. DEA efficient DMUs are 
always thought to perform better than DEA inefficient 
DMUs. If a DEA efficient DMU, however, also has a 
poorer relative efficiency than a DEA inefficient 
DMU,when they are both evaluated from the angle of 
the worst possible relative efficiency, can we still say 
that DEA efficient DMU performs better than the 
DEA inefficient DMU? It is obvious that the answer 
is negative. Then, there must be a method to combine 
the best and the worst relative efficienciesto give a 
general assessment of each DMU. Entani [2] 
considers the efficiency of DEA from both optimistic 
and pessimistic viewpoints. Wang [3], in 2006, 
evaluated the efficiency of DEA using ideal and anti- 
ideal decision-making units. 

DUG HUN HONG, in 2006, surveyed the minimum 
and maximum operators for fuzzy numbers. The first 
paper on fuzzy DEA was written by Sengupta[5]in 
1992. Fuzzy DEA models can represent real world 
problems more realistically than the conventional 
DEA models.InLertworasirikursbook [6] in 2002; 
several methods have been offered for solving the 
fuzzy CCR model. We can consider two approaches 
for solving fuzzy CCR. The first one defuzzifies the 
fuzzy CCR model and changes it into the equivalent 
crisp model and the second one uses a — levels 
tocreate interval valued linear programming that 
solves the fuzzy DEA by parametric 
programming. Tananka, Entani, Maeda [7], formulated 
two DEA models: one model that gives upperlimit 
(best case) efficiency and one model that gives 
lowerlimit (worst case) efficiency. With 
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defuzzification approach we first defuzzify the fuzzy 
inputs and outputs into crisp values, and then solve 
the resulting crisp model using an LP solver. There 
are several methods such as: "COA" ,"max-min" 
,"MOM", "max- max"methods; these methods are 
used for trapezoidal and triangular membership 
functions.Liu [8] also studied fuzzy DEA models. The 
idea of his development is the same as that proposed 
by Entani, Tanaka and Maeda. It falls into the 
category of parametric programming. 

In this article, ideal and anti-ideal decision-making 
units have been introduced. One evaluates DMUs 
with fuzzy data from the viewpoint of the best 
possible relative efficiency, and the otherevaluates 
themfrom the view point of the worst relative 
efficiency. These two relative efficiencies combine to 
form RC index which expresses the relative closeness 
of the units to the ideal decision-making unit. All of 
the data of the decision-making units are fuzzy with 
symmetrical triangular membership function. At 
each a — level the fuzzy inputs and outputs 
correspond to intervals,[L,U], which we will try to 
evaluate the efficiency of units using appropriate 
a — levels An interval-valued linear programming 
model is created by using these input and output 
intervals and we will try to evaluate the efficiency of 
units with fuzzydatausing ideal anti-ideal units and 
also evaluate the ranking of units through 
different a — levels. 

2. Fuzzy DEA models using ideal and anti-ideal 
decision making units 

Assume that there are n decision-making units 
(DMUs) to be evaluated, each DMU with m inputs 
and s outputs. We denote the inputs and outputs of 
(DMUj)j = 1,2, ...,n with fy ,i = l,2,..,m and 
y r i ,r = 1,2, ...,5which all of the inputs and outputs 
arefuzzysymmetrical triangulamumbers and positive. 
At each a — level, the fuzzy inputs and fuzzy outputs 
correspond to intervals, [L, U], on the membership 
function, in which at each a — level L and U 
represent the lower and higher band of thedata. By 
using these inputs and outputs an interval linear 
programming model iscreated. The ideal decision- 
making unitwhich will be represented by IDMU and 
the anti-ideal decision-making unit which will be 
represented by ADMU are defined as follows: 



Definitionl: The ideal decision making unit (IDMU) 
is a virtual decision-making unit that has the least 
input and the most output. The anti-ideal decision 
making unit (ADMU) is a virtual decision-making 
unit that has the most input and the least output. 

The ith input ofDMUj is Xi;,i=i,2,..,mthat for different 

;'=l,2,...,n 

a — levels can be defined 

as (xijY = [xij(a),xfj(a)J i = 1,2, ...,m, j = 

1,2,..., n. Similarly, the rth output of DMf/ ; for 
different 

a -levels is (y rj ) a = [y ^ (a), y % (a) J r = 

1,2, ...,s According to the definitionl for each 
a — level we consider (ij" m )",i = l,2,...,m, 
(y ™ ax ) a , r = 1,2, ...,s respectively as the inputs and 
the outputs of the ideal decision-making unit (IDMU) 
and {xT ax Y,i = 1,2, ...,mand (y? in ) a , r = 

1,2, ...,s as the inputs and the outputs of anti-ideal 
decision-making unit(ADMU) which defined as 
follows: 



(x f in y = min 



x/-(a) 
= 1, ...,n 
i = 1,2, ...,m 



, mm 



(x™ ax r = [max\, X ^' (a) 



, max < 
l,...,n) 17 = 1, .. .,n 

i = 1,2, ...,m 



= l,...,7l 

x u tJ (a) 



(yf in ) a = I min \. y ri ^ I min ) J >) 



l,...,7lj (; = 1, ...,71 

r = 1,2, ...,<> 



f-ymaxy _ I max > 9 rji«) I, maY } J rj 



, max < ' 
1, ...,ri J \J = l,...,n 

r = 1,2,..., s 
We can show the relative efficiency of ideal decision- 
making unit with# /DML , . It is clear that the ideal 
decision-making unit must have the best (i.e. the 
highest) efficiency. According to the definition of the 
relative efficiency, the efficiency valuefor 
different a — levels by using "Best-Best", "Worst- 
Worst", "Best-Worst", "Worst-Best" methods, can be 
obtained by solving these models: 



Max 0&1? b =^Vg? bo *)2 

r=l 
m 
S.t J\(*r n ) L a = l (1.1) 
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2_, U r (Vrj)a ~ /_ j V i (%)« ^ ° 



; = 1,2 n 

u r > 0,Vi > OVt.r 



And, 



Max flL^^rfyr™ 1 ) 1 , 
r=l 

m 

s .t J\(*r n )^ = i (i.2) 

i=l 
s m 

2_, u r (Srifa ~ Y f i (*ij)a < 

r=l i=l 

; = 1,2 n 

u r > , v t > Vt , r 
And, 

5 

Max e = ^ft")2 

r=l 

m 

S.t Y J V i ( X f n ) L a = l (1-3) 

i=l 

s m 

r=l i=l 

;' = 1,2 n 

u r > , v t > Vt , r 
And, 

s 

Max «=JV(yr ma ^ 

r=l 

m 

s.t J\(*r n )^ = i a.4) 

i=l 
s m 
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and (~)% , (~)a are the column vector of the 
maximum and minimum values of the corresponding 
fuzzy sets achieved at the given a — level 
respectively. The hypothesis In "Best-Best" method is 
that: at each a — level ,the smallest inputs and the 
largest outputs are used for every DMU and in the 
"Worst-Worst" method at each a — level, the largest 
inputs and the smallest outputs are used for every 
DMU.In "Best-Worst" method from the input and 
output intervals at each —level , the smallest inputs 
and the largest outputs are used for DMU ,while the 
largest inputs and smallest outputs are used for all 
other DMUs.In "Worst-Best" method from the input 
and output intervals at each a — level ,the largest 
inputs and smallest outputs are used for DMU , while 
the smallest inputs and largest outputs are used for all 
other DMUs. 



)B-B qW-W* nB-W 



9 



W-B* 



Wp rnnsirlpr R u ~ u aw-w nt 

vve consiuer u 1DMl] , ty 1DMl] , u 1DMl] , u IDMU 

respectively as the optimal solutions of 
(1.1),(1.2),(1.3),(1.4) models (the best efficiency). 
Since there is the possibility that the abovementioned 
models could have multiple optimal solutions, we 
consider the following models for computing the 
efficiency ofDMU ,on the condition that the relative 
efficiency of the ideal decision-making unit (IDMU) 
remains unchanged , utilizing "Best-Best"," Worst- 
Worst" , "Best-Worst" , "Worst-Best"methods at 
each a — level: 

s 

Max el~ B =^\ r (y ro )a 

r=l 

m 

S.t 2_, v i (XioJa = 1 

1=1 

s m 

^ "r ( 9r X ) U a ~ £ V t (8? D - M % (xr n ) L a ) = (2.1) 
r=l i=l 

s m 

\ "r (y r j)a - \ V i (%)« - ° 



;=i 



j = 1,2, ...,n 



j = 1,2, ... ,n 
u r > , v t > Vt , r 

The abovementioned models are linear programming 
problems in which u r andvi are the decision variables 



u r ,v t > Vt, r 

s 

? li r \y r0 )a 



Max 6l 
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m 



S.t } V; (X in )% = 1 

1=1 



X v i (Xio)a 

1=1 

s m 

£ Ur ( yr X ) L a ~ Y, »l(«* (Zf n ) U a) = (2.2) 
r=l i=l 

s m 

X U r (Vrj) L a ~ ^ U i (%)« ^ ° 



; = 1,2 n 



u r , U; > Vt, r 

s 

— ? u r \y r0 )a 



Max 8„ 



m 



s.t >Vi {Xj,X, = I 

i=i 



i=l 

m 

X «r ( y r ma *)£ ~Y v i ^'^ ( f r in ) L a) = (2.3) 

r=l i=l 

s m 

2_, U r (yrj)a ~ X V[ &&« ~ ° 
r=l i=l 

/ = 1,2, ...,n,j =£ o 

s m 

r=l i=l 

u r ,v t > Vt, r 

s 

Max er" fi =^"r(y™)^ 

7—1 

m 

X V i ^io)a 

1=1 

m 

Y ^ ( yr x ) L a ~ X ViV%u B u (Zf n j V a) = (2.4) 

r=l i=l 

s m 

r=l i=l 

j = 1,2, ...,n,j =£ o 

s m 



S.( )V; {X;,X, = 1 



u r , V; > Vt, r 

In the above models, DMU has been the analyzed 
decision-making unit and 

Q?DMvfimMu' 'fi?DM) 'fiwm are the optimal solutions 
of the(l.l),(1.2),(1.3),(1.4) models respectively. 

Similarly, we can define the efficiency of anti-ideal 
decision-making unit {ADMU). It is clear that the 
efficiency of ADMU is worse (lower) than other 
decision-making units. We have the following linear 
programming models: 



Min «»&&,= ^VCW* 1 )? 

r=l 

m 

s.tYv i (xJ nax ) L a = 1 

1=1 
s m 
X "r (Jrj)a ~ \ V t (ot i} ) L a > 0j = 1,2 71 



r=l i=l 

u r , v t > Vt, r 
And, 



Min C=^it r (y™)J 

r=l 

m 

i = l 
s m 

X "r CMa ~ X Ui ^" ~ ° ; = 1 ' 2 " 



r=l i=l 

u r , v t > Vt, r 
And, 



Min p&K,=]£u r GV mte )2 

r=l 
m 
S.tYviix^fa = 1 

i=l 
s m 

X U r ($rj) L a _ X ^ ^ ^ ~ ^ = 1 ' 2 " 



1 = 1 



u r ,v L > Vt, r 
And, 



(3.1) 



(3.2) 



(3.3) 
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Min q>^ u =Y J u r(y? lln )a 
r=l 

m 

s.t\v i (xf lax )^ = l (3.4) 

s m 

X U r (9rj)a ~ /_ l V i (*y)a ^ °J = L2 U 

r=l 1=1 

u r , v t > Vt, r 

Let s (Padmu > (Padmu > Vadmu ■> Vadmu are the 
optimal solutions for ADMU (the worst efficiency) 
with "Best-Best", "Worst-Worst", "Best-Worst" 
, "Worst-Best" methods through different a — levels. 
Now we can consider the following linear 
programming models for determining the worst 
efficiency for DMU , on the condition that the 
efficiency of the anti-ideal decision-making unit 
(ADMU) remains unchanged, By using "Best-Best", 
"Worst-Worst","Best-Worst", "Worst-Best" methods 
at each a — level we have: 

s 

Min q>%~ B =2_ L U r (y ro ) u a 

r=l 

771 

S.t 2_, v i (XioJa = ! 

1=1 

s m 

£ u r (yr n ) u a - £ v, (cp^'u (X? aX ) L a) = (4.1) 

r=l i=l 

s m 

^ Ur (y r j)a ~Yj Vi ( *V)« - ° j = U " 

r=l i=l 

U r , V t > Vt, r 

And, 

s 

Min <p™- W = ^V (yroJa 

r=l 

771 

S.t }vi (x i0 ) u a = 1 

1=1 
s m 

£ "r (y r min ) L a - £ V( «DM^* («r aZ )^) = (4.2) 
r=l i=l 

s m 

X "*• C^rjOa - /. Ui (*y)« - ° ; = 1<2 n 



u r , v t > Vt, r 
And, 



Min <p B - M '=^u r (y ro )^ 

r=l 

m 

S. t 2_, U i (**<>)« = * 

i = l 

s m 

£ u r (yT n )£ - £ V, (cp^u (Z? aX ) L a) = (4.3) 

r=l i=l 

s m 

X "r (jVjOa ~ X f i (%)a > ;' = 1,2 n ,J * 

r=l i=l 

s m 

X "r (&o)a - X Vi ^ fo ^« - ° 



u r , V; > Vt, r 
And, 

s 

r=l 

m 

S.t X U i (**<>)« = X 
1=1 

s m 

£ Ur (yr^)^ - £ V, («« (*r^) = (4.4) 

r=l i=l 

s m 

X "r (Jrj)a ~ X Ui (%)« - ° ^ = 1 ' 2 " 'J * ° 

r=l i=l 

s m 

X U r (yro)a ~ X Vi ^ i0 ^ a ~ ° 
r=l i=l 

u r , v t > Vt, r 

Let's do 3 ' 8 ' ,8™-™' ,6$-™' ,8™-** be the optimal 
solutions of (2.1),(2.2),(2.3),(2.4) problems 

respectively, and also (pl~ B * ,(p^~ w * ,(pl~ w \(p^~ B * 
be the optimal solutions of (4.1),(4.2),(4.3),(4.4) 
problems respectively, which are representative of the 
best and the worst relative efficiency ofDMU . 

It is clear that the problems (1.1) to (1.4) and the 
problems (2.1) to (2.4) which are dependent on the 
ideal decision-making unit, evaluate the best relative 
efficiency of the ideal decision-making unit (IDMU) 



89 



http://sites.google.com/site/ijcsis/ 
ISSN 1947-5500 



(IJCSIS) International Journal of Computer Science and Information Security, 

Vol. 10, No.2, 2012 



and the other decision-making units (DMUs), Also the 
problems (3.1) to (3.4) and (4.1) to (4.4), which are 
dependent on the anti-ideal decision-making unit 
(i4DM£/),evaluate the worst relative efficiency of the 
anti-ideal decision-making unit (ADMU) and the 
other decision-making xmits(DMUs). 

Definition 2: Suppose that 

e?DMu^m,\e?DMu™MwM B uKe the best relative 
efficiency for IDMU with four abovementioned 
methods and 9g- B ' ,0™- w ' ,0*- w ' wdO™ ~ B * are the 
best relative efficiency for DMU withthese methods, 

and (Padmu , <Padmu > <Padmu and <Padmu are also tne 
worst relative efficiency for ADMU with 4 
abovementioned methods and 

W™' 3 ',<Po~ W ' ,^~ W ',(Po~ B * are the worst relative 
efficiency for DMU with these method. Then, the 
relative closeness (RC) of DMU to IDMU with 4 
mentioned methods is defined as follows: 



RC 



RC 



RC 
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To TADMU 


to*-*' 
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ADMU ) T \ u IDMU °0 ) 



(5.1) 



(5.2) 



(5.3) 



(5.4) 



It is clear that the greatest difference between 
cp B ~ B * and cp B - D B MU . q>™~ w * md<p^ u ' 
<<p B ~ W * and cp %%„ . <p ™~ B * and <p ^mu and the 



and6 B ~ B ' 



9. 



lowest difference between B B auuu IDMU 

, v l and0S* <0 B - W * and feV ' 
0^ _B *and 6 Ydmu displays the better performance of 
DMU . Therefore, the bigger RC value with the 4 
aforementioned methods represents the better 
performance ofDMU . Since the RC index integrates 
both the best and the worst possible relative 
efficiencies of each DMU, it can be regarded as a 
general ranking of DMUs. 

3. An applied example 

Suppose 5 decision-making units with two fuzzy 
inputs and outputs as shown in table 1 , all of which 
have symmetrical triangular membership functions. 



The membership functions are denoted by (c, d) 
where c is the center and d is the spread of the 



DMU 
0) 


1 


2 


3 


4 


5 


Input 1 


(4.0,0.5) 


(2.9,0.0) 


(4.9,0.5) 


(4.1,0.7) 


(6.5,0.6) 


Input 2 


(2.1,0.2) 


(1.5,0.1) 


(2.6,0.4) 


(2.3,0.1) 


(4.1,0.5) 


Output 
1 


(2.6,0.2) 


(2.2,0.0) 


(3.2,0.5) 


(2.9,0.4) 


(5.1,0.7) 


Output 

2 


(4.1,0.3) 


(3.5,0.2) 


(5.1,0.8) 


(5.7,0.2) 


(7.4,0.9) 



membership functions. 

Table 1. Decision Making Units with fuzzy inputs and outputs 



Through different a — levels , such as a = 
0,0.25,0.5,0.75,1 , we calculate the efficiency and the 
ranking of units using "Best-Best" ," Worst- Worst"," 
Best- Worst"," Worst-Best" methods. Tables from 2 to 
5 includes 5 columns, the first section of which 
represents the efficiency of decision-making units 
through different a — levels with input-oriented CCR 
model using the aforementioned methods and the 
second column of the tables,the row related to IDMU 
represent the efficiency of the ideal decision-making 
unit through different a — levels depending on the 
method chosen using (1.1) or (1.2) or (1.3) or (1.4) 
models , and the other rows except the row related to 
ADMU, represent the efficiency of decision-making 
units through different a — levels, depending on the 
method chosen using (2.1) or (2.2) or (2.3) or (2.4) 
models.The third column of the tables, the row related 
to ADMU represents the efficiency of the anti-ideal 
decision-making unit through different a — levels 
depending on the method chosen, utilizing (3.1) or 
(3.2) or (3.3) or (3.4) models, and the other rows, 
except the row related to IDMU, represent the 
efficiency of decision-making units through different 
a — levels, depending on the method chosen using 
(4.1) or (4.2) or (4.3) or (4.4) models.The 4 th section 
of the tables indicate RC index which represents the 
relative closeness of the evaluated decision-making 
unit to ideal decision making unit through 
differenta — levels, depending on the method chosen 
using (5.1) or (5.2) or (5.3) or (5.4) models, and 
finally The last section of the tables represents the 
ranking of decision-making units through 
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different a — levels by using RC indexand " Best- 
Best", "Worst-Worst", "Best-Worst" and "Worst- 
Besfmethods. 
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4.Conclusion 

In this article ,the method of defining ideal and anti- 
ideal decision-making units when the data are fuzzy 
numbers and also an RC index , which represents the 
relative closeness of the evaluated decision-makiong 
unit to the ideal decision-making unit, were 
introduced. Considering different a — levels and 
Worst-Best, Best- Worst, Worst- Worst and Best- Best 
methods, we can defuzzify, fuzzy models and 
compute the efficiency of decision-making units 
considering ideal and anti-ideal decision-making units 
and rank decision-making units with fuzzy data by 
calculating the RC index. 
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Abstract- String matching plays an important role in field of 
Computer Science and there are many algorithm of String 
matching, the important aspect is that which algorithm is to be 
used in which condition. BM(Boyer-Moore) algorithm is 
standard benchmark of string matching algorithm so here we 
explain the BM(Boyer-Moore) algorithm and then explain its 
improvement as BMH (Boyer-Moore-Horspool), BMHS 
(Boyer-Moore-Horspool-Sundays), BMHS2 (Boyer-Moore- 
Horspool-Sundays 2), improved BMHS( improved Boyer- 
Moore-Horspool-Sundays) ,BMI (Boyer-Moore improvement) 
and CBM (composite Boyer-Moore).And also analyze and 
compare them using a example and find which one is better in 
which conditions. 

Keywords-String Matching: BM; BMH; BMHS; BMHS2; 
improved BMHS; BMI; CBM 

I. INTRODUCTION 

In computer science, the Boyer-Moore string search 
algorithm is a particularly efficient string searching 
algorithm, and it has been the standard benchmark for the 
practical string search literature. 

It was developed by Bob Boyer and J Strother Moore in 
1977. The algorithm preprocesses the pattern string that is 
being searched in text string. [5] 

Before BM algorithm was proposed, the direction of 
character comparison was consistent to the moving direction 
of the pattern i.e. both are from left to the right. But in BM 
the direction of character comparison is different from the 
moving direction of the pattern i.e. from right to left in 
pattern. [4] 

After BM algorithm was proposed there were some 
algorithms are proposed to improve it. In 1980, Horspool 
simplified BM algorithm and proposed BMH algorithm 
Although it only used the information of the table Right, 
BMH algorithm acquired no bad efficiency. In 1990 Sunday 
proposed BMHS algorithm that improved the BMH 
algorithm. [6] 

In 2010, Lin quan Xie, Xiao ming liu proposed BMHS2, 
which is strictly based on the analysis of BMHS algorithm 
to improve is in the match fails, the text string matches last 
bit characters to participate in the next match, a character 



string in the case appear to increase the last bit character and 
appear in the character string matching the first characters of 
a position if there is consideration. [3] 

In 2010 BMI algorithm is proposed by Jingbo Yuan, 
Jisen Zheng, Shunli Ding which is improvement of BM 
algorithm. The BMI algorithm combines with the good- 
suffix function and the advantages of BMH and BMHS. At 
the same time the BMI algorithm also takes into account the 
singleness and combination features of the Next-Character 
and the Last- Character. [8, 9] 

There are two important factors which influence the 
efficiency and speed of pattern matching and they are the 
cost to find the mismatching character in the text string and 
the shift distance to right. On basis of the two factors, an 
improved algorithm called Improved BMHS algorithm 
which is given by Yuting Han, Guoai Xu in 2010. [7] 

Another improved algorithm called composite Boyer- 
Moore was proposed in 2010 by Zhengda Xiong. The key 
issue of the composite Boyer-Moore algorithm is how to 
utilize the history comparison information achieved at 
previous iteration. So a new concept of two-dimensional 
table Jump[m][m] is introduced. [4] 

ii. Bm Algorithm 

The BM algorithm scans the characters of the pattern 
from right to left beginning with the rightmost one and 
performs the comparisons from right to left. In case of a 
mismatch (or a complete match of the whole pattern) it uses 
two pre-computed functions to shift the window to the right. 
These two shift functions are called the good-suffix shift 
(also called matching shift and the bad-character shift (also 
called the occurrence shift). 

Assume that a mismatch occurs between the character 
PfiJ =b of the pattern and the character T fi+j] =a of the 
text during an attempt at position j. Then, Pfi+1 .. m- 
l]=T[i+j+l .. j+m-l]=u and P[i#T[i+j]. The good-suffix 
shift consists in aligning the segment Tfi+j+1 .. j+m- 
l]=P[i+l .. m-lj with its rightmost occurrence in P that is 
preceded by a character different from PfiJ. 
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BM algorithm will carry through shift computing as 
follow. 

(1) good- suffix function 

The algorithm looks up string u leader character is not b 
in P from right to left. If there exist such segment, shift right 
P to get a new attempt window. If there exists no such 
segment, the shift consists in aligning the longest suffix v of 
Tfi+j+1 .. j+m- 1] with a matching prefix of P. 

(2) bad-char function 

The bad-character shift consists in aligning the text 
character T [i+j] with its rightmost occurrence in P [0 ... m- 
2]- 

If Tfi+jJ does not occur in the pattern P, no occurrence 
of P in T can include Tfi+jJ, and the left end of the 
window is aligned with the character immediately after 
T[i+j], namely Tfi+j+1] 

BM algorithm uses good-suffix function and bad-char 
function to calculate the new comparing position, shifting 
rightward P by taking maximum of these two values. [1] 

Practice shows that BM Algorithm is fast in the case of 
larger alphabet. In preprocessing phase, time and space 
complexity is O (m+ U), where U is the size of the finite 
character set relevant with pattern and text. In searching 



phase time complexity is in O (mn). There are 3n text 
character comparisons in the worst case when searching for 
a non periodic pattern. Under best performance time 
complexity is O (n/m). Under the worst time complexity is O 
(mn). [1] 

Advantages 

• The both good-suffix and bad-char combined provides 
a good shift value as maximum of two is taken as shift 
value. 

Disadvantages 

• The preprocessing of good-suffix is complex to 
implement and understand. 

• Bad-char of mismatch character may give small shift, 
if mismatch after many matches. 

Example: We have a text string 

VSTRINGMATCHINGISTOFINDTHEPATTERNQ And a 
pattern fPATTERNb which is to find in a text string, so we 
apply all above algorithm as discussed below to solve this 
example. Example of BM is shown in Table 1. 



TABLE 1 .BM Example (5 Shift and 13 Comparisons) 
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in. Improvement Of Bm Algorithm 

A. BMH Algorithm 

The preprocessing of good suffix is hard to be 
understood and implemented; BMH algorithm only uses the 
bad characters shift. In BMH algorithm, no matter the 
location of mismatching, the distance of shift to right is 
determined by the character in the text string which is 
aligned to the last one of pattern string. [7] 

In preprocessing phase, time complexity isO(m+ s). In 
searching phase, time complexity is 0(mn). In the best 
performance, time complexity is 0(n Im). Practical 
applications show that BMH algorithm is much more 
efficient than BM algorithm. [2] [10] 
Example: shown in Table 2. 



Advantages 



The concept of Good-suffix is removed so easy to 

implement. 

In case of mismatch ,the shift value is determined by 

the bad char value of last character instead of 

character that caused mismatch so more jump is 

archived using bad char than in BM. 



Disadvantages 



The removal of good-suffix sometime may not give 
shift as much as in BM. 
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TABLE 2. BMH Example (5 Shift and 13 Comparisons) 
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B. BMH S Algorithm 

The core idea is in the calculation of Bad char function; 
consider the situation of the next character, namely the use 
of the next character T[m] to determine the right offset. If 
the character does not appear in the matching string is skip 
that step by pattern length + 1 ; otherwise, the mobile step= 
match strings in the far right of the character to the end of 
the range+l.In the matching process, the mode string must 
not be asked to compare, it does not match is found, the 
algorithm can skip as many characters to match the next 
step to improve the matching efficiency. [3] 

BMHS algorithm worst case time complexity is O (mn), 
the best case time complexity is O (n/m+1). For a short 
pattern string matching problem, the algorithm is faster. [3] 



Example: shown in Table 3 

Advantages 

• In BMH the maximum shift achieved is equal to 
pattern length but in BMHS the maximum shift that 
can be achieved is equal to one more than pattern 
length. 

Disadvantages 

• Suppose last character is not in pattern but next-to- 
last character is in pattern so In state of mismatch 
less shift is achieved as compared to BMH. 



TABLE 3.BMHS Example (4 Shift and 13 Comparisons) 
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C. BMHS2 Algorithm 

The idea of algorithm is when mismatch occur at any 
position then the Right Shift value is determined by Next- 
to-Last character and Last character of Text corresponding 
to Pattern that is T[i+m] and T[i+m-l] where m is length of 
Pattern. 

Now matching start from Last character of Pattern, if 
mismatch at any position than consider Next-to-Last 
character (T[i+m] ) of Text and find its position in pattern 

(1) If not in pattern than right shift by m+1. 

(2) If occur at first position than right shift by m. 

(3) If occur other than first position than shift 
calculated is X than 

■ Consider Last character of Text corresponding to 
pattern and calculate shift, if shift calculated by 
this is X than shift by X. 

■ Otherwise shift by m+1. 



BMHS2 algorithm worst case time complexity is O(mn), 
the best case time complexity is O(n), where n is length of 
text and the maximum moving distance of m+1. [3] 

Example: shown in Table 4 
Advantages 

• This algorithm considers last character and next-to- 
last character both so it combined advantages of both 
BMH and BMHS. 

Disadvantages 

• Searching overhead increases as we have to take care 
of two characters for calculation of shift. 
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D. Improved BMHS Algorithm 

The improved algorithm uses the comparative order 
from right to left. Supposing that the pattern string POPle 
Pm -i aligns with the part of the text string Tk -m+i e Tk-i Tk. 

The preprocessing phase is as follows: construct the 
array Skip[x] according to the bad-character rules, in the 
conditions of x € X. In addition, improved algorithm needs 
to construct Num\y] which records the times of each 
character appearing in the pattern string. 

The searching phase is as follows: compare the 
character Pm-1 with Tk. 

When mismatch occurs between P„,.j and T k , calculate 
Skip /T k+ i] and Skip [T k+2 ]. If Skip /T k+1 ] is equal with one, 
the pattern string will shift one point to right. Otherwise, 
the movement will be determined by the larger one between 
Skip /T k+ i] and Skip [T k+2] . 

When P„,.j and T k match successfully, compare the 
character P m _ 2 with character T k .i. If the match is successful, 
continue to comparing P„,_ 3 and T k _2, P„,-4 and T k _3, and so 
on, until the text string is matched completely. If mismatch 
occurs at P m _ 4 I r k . 3 , calculate Skip /T k+ i] and Skip [T k+1 ]. 
If Skip /T k+1 ] is equal with one, check Num [P m s] whether 
it is equal with one, if Num [P m .J is equal with one, change 
Skip /T k+1 ] to m+1. Then compare between Skip /T k+1 ] and 



Skip [T k+2 ], select the larger one as the movement of the 
Pattern shift. [2] 

In preprocessing phase, time complexity is O (m+ s). In 
searching phase, if the successful match takes place in 77, it 
is compared (i-l)*m times before successful matching, and 
m times during article i time of comparison. So it is 
compared i*m times. 

The time complexity is O (mn). In the best case, if 
successful match takes place in 77, it is compared i /(m+2) 
times before successful matching, and m times during 
article i time of comparison. So it is compared m+i / (m+2) 
times. The best time complexity is O (n /m+2). [2] 
Example: shown in Table 5 

Advantages 

• Maximum shift that can be achieved using this 
algorithm is pattern length + 2. 

Disadvantages 

• Calculation of shift using Next-to-Last and Next-to- 
Next-to-Last character increase searching over head 
and for that preprocessing of Num[ ] is done which 
increases preprocessing overhead. 



TABLE 5. Improved BMHS Example (4 Shift and 12 Comparisons) 
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E. BMI Algorithm 

The BMI algorithm combines with the good-suffix 
function and the advantages of BMH and BMHS [8] [9]. At 
the same time the BMI algorithm also takes into account 
the singleness and combination features of the Next- 
Character and the Last- Character. 



The basic idea behind the algorithm is to achieve the 
maximum shift distance in the event of a mismatch. 
Assume that now P [0] ...P[m-1] correspond to 
TfiJ ...T[i+m-l] during the attempt. If a mismatch occurs, 
the shift right position will be calculated with function 
Onecharfx) and TwoCharfx) as following formula (1) and 
(2).[1] 
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!— 1 Character x not in pattern 
j tke rightmost position of 
character x in pattern 

'— 1 two characters T[j •+- m — l]x 
not in pattern 
TwoChar(X) =^ j the rightmost position of 
characters T\J •+■ m — l]x 
in pattern 



(i> 



(2) 



Define: The Last-Character refers to the rightmost 
character of each attempting window in text T. The Next- 
Character refers to the first character on right side of 
attempting window in text T. 

Now if m comparisons have completed and TjT i+1 ...T i+m . 
1 =P 1 P 2 ...P m -i, the matching is successful. If (Pj—a) f- 
(T i+ j=b) in (m-j)-th comparison, the BMI algorithm 
calculates the jump shift as below methods. Denote 
Ti+j+iT i+ j+2---Tj+ m -i=Pj+iPj+2...Pm-i = u and T i+ ffPj. 
(l)Calculate the jump shift using the Last-Character d in 
pattern and OneChar function. The algorithm looks up the 
position of the first occurrence of the Last-Character d 
from right to left in PoPi...P m _ 2 . If found the position, the 
pattern P right shifts to align with character d. If not found 
the position, the pattern P right shifts to align with right 
side of character d. Then the algorithm begins to compare 
in new attempt window. 

(2)Calculate the jump shift using the Next-Character c and 
OneChar function. The algorithm look up the position of 
the first occurrence of Next-Character c from right to left 
in P0P1 ...Pm-1. If found the position, the pattern P right 
shifts to align with character c. If not found the position, 
the pattern P right shifts to align with right side of 
character c. Then the algorithm begins to compare in new 
attempt window. 

(3)Calculate the jump shift using the Last-Character d, the 
Next-Character c and TwoChar function. Denote X as the 
combination of character b and c, that is, X=bc. The 

TABLE 6.BMI Example 



algorithm look up the position of the first occurrence of X 
from right to left in POPle Pm-1. If found the position, the 
pattern P right shifts to align with character b. If not found 
the position, the pattern P right shifts to align with right 
side of character b. Then the algorithm begins to compare 
in new attempt window. 

In the case of mismatch, the BMI algorithm combines 
three different shift functions to optimize the number of 
characters that can be skipped during the skip process. 

If the Last-Character d is matching with the rightmost 
character of Pattern, the algorithm calculates the jump shift 
using above three methods and takes the maximum value of 
its results as final jump shift. If failed, the algorithm 
calculates the jump shift using above method (1) and 
method (2) and takes the maximum value as final jump 
shift. [1] 

Under best performance the time complexity of BM and 
BMH algorithm all are 0(n/m), the time complexity of 
BMHS and BMI algorithm all are 0(n/m+l), but the 
average time complexity of BMI algorithm is better. [1] 
Example: shown in Table 6 

Advantages 

• BMI uses last character, Next-to-Last character and 
combination of these two characters for calculation 
of shift means BMI Takes advantages of BMH, 
BMHS and good-suffix feature of BM for 
combination of last character and Next-to-Last 
character. 

Disadvantages 

• In calculation of shift using three different methods 
and taking maximum of these increases overhead in 
searching. 



(4 Shift and 11 Comparisons) 
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F. CBM Algorithm 

The key issue of the CBM algorithm is how to utilize 
the history comparison information achieved at previous 
iteration. So we construct a two-dimensional table 
Jump[m][m]. Jump[/][/] denotes the shift distance of pattern 



P, when the mismatch at previous iteration appears at p[z], 
and the mismatch at current iteration appears at p[/']. This 
table is only related to pattern P. Once Jump[m][m] is 
constructed, it can be utilized for searching P in different 
texts. 
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The comparison principle of algorithm CBM is shown 
in Figure 1. Suppose P is at place P at previous iteration, 
and the mismatch appears at index i of Po; and suppose P is 
at place Pi at current iteration, the mismatch appears at 
index j of Pi; then P 2 , PCS new position, must meet 
following conditions: its substring at B matches with Pi& 
substring at B; its character at b does not match Pi& 
character at/; its substring at A matches P ($ substring at A; 
and its character at a does not matches with Pi& character 
at i. Above four matching conditions make a large shift 
distance Jump[z][/] for pattern P. [4] 
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Figurel. Working principle of CBM 

In the procedure, the initial values of Jump[/] [/] is set to 
Jump[/] for every i. Then the values increased gradually by 
test, until it satisfies above four matching conditions. After 
generating table Jump[m][m], the specific matching process 
is similar to the BM algorithm. 

In the case of small alphabet and long pattern, values in 
Jump[m][m] that is close to the right column are usually 
larger than the corresponding values in Jump[m], and the 
matching efficiency are improved. Binary searching in 
Computer Science and DNA sequence tests in genetic 
engineering are such kind of applications. [4] 

iv. Comparison And Analysis 

BMH algorithm is more efficient when last character 
does not occur in pattern. BMHS is more effective than 
BMH when last character occur in pattern but next to last 
character does not occur in pattern. Improved BMHS 
algorithm is efficient when next to last character and next to 
next to last character does not occur in pattern. BMHS2 
perform better when next to last character does not occur in 
pattern or occur at first position in pattern. BMI algorithm 
perform better when Next to Last character does not occur 
in pattern; Or when Last character does not occur in 
pattern; Or when combination of Last character with Next 
to Last character does not occur in pattern. CBM is 
effective in case of small alphabet and long pattern such as 
Binary Searching. 

Analysis Based on Example: 

• In our example BM and BMH performance was equal 
as SHIFT=5 and Comparison=13. 



■ In case of BMHS SHIFT decreases to 4 but Comparison 
remains to 13, so we canQ say that BMHS always 
perform better than BMH, it totally depends on Input. 

■ Improved BMHS performance is better than BM, BMH 
and BMHS as SHIFT=4 and Comparison^ 2. 

1 Performance of BMI and BMHS2 is even better than 
Improved BMHS as SHIFT=4 and Comparison=l 1. 

■ In example performance of BMI and BMHS2 is equal 
but we also canQ say that there performance remains 
always same, it is also depends on Input. 

Table 7. Comparison 
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WORST-CASE TIME 
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Analysis Based on Experiment: 
Experimental Environment 
Processor: i7 

RAM: 8 GB 

OS: windows 7 

Language: visual C++ runs on visual studios 2008 

Experimental Data 

Text File: of size 2, 68,196 KB in which large number of 

occurrence of pattern. 

Pattern of length 15 

Experiment 

In the experiment we have search a pattern in text and 
calculated number of comparison which is how many times 
we compare pattern character with text character and search 
time is also calculated in milliseconds. The results as search 
time and number of comparison, corresponding to different 
algorithm are shown in table 8. 





TABLE 8: 


Experimental Results 




S.N. 


Para Algo 


No. of 
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Search Time 
(millisec.) 
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On the basis of experimental results we plot bar graphs for 
comparison and search time as shown in graph 1 and 
graph2. 



COMPARISON 




BM BMH BMH5 Imp BMI BMHS2 

BMH5 

Graphl: Number of Comparison of Different Algorithm 

TIME (milliseconds) 




BM BMH BMHS Imp BMI BMHS2 

BMHS 

Graph2: Searching Time of Different Algorithm 

v. Conclusion 

The comparison of BM and its relative algorithm is 
performed on the basis two factors; one is number of 
comparison performed and second is search time. In 
example and in experiment we present a comparison on the 
basis of number of comparison performed that performance 
of BM, BMH and BMHS are almost equal as number of 
comparison is almost same. Improved BMHS perform 
better than BMHS as number of comparison decreases. BMI 
and BMHS2 perform even better than Improved BMHS as 
number of Comparison decreases. In Experiment we also 
present a comparison on the basis of search time in which 
BM and BMH perform almost same but BMHS search time 
increases. Improved BMHS search time is less in 
comparison to BM, BMH and BMHS. In BMI searching is 
faster than above four and BMHS2 search time is even less 
than BMI. So finally we can say that BMHS2 is best of all 
six algorithms as search time and number of comparison 
both are less than in all other algorithm. 



Composite Boyer-Moore algorithm is efficient in case of 
binary searching where small varieties of alphabet and long 
pattern. 

The performance of algorithm depends on two factors, 
first on Input, number of inputs and type of inputs, Second 
is Methodology of algorithm, so there may be possible that 
some variation in performance occur as input changes. 

VI. FUTURE WORK 

The focus of future work is to improve existing 

algorithm and finding the efficient string searching 

algorithm so that searching speed can be increased and 
performance as well. 
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Abstract — This paper presents a novel approach for speaker 
identification using spectrograms and Hartley transform. 
Performance of this approach is observed to be improving up 
to 88.34% as cropping of feature vector is done at the specific 
area of transformed spectrograms. Computational complexity 
of this technique is found to be same as the computational 
complexity of our previous work using DCT and spectrograms. 

Keywords-Speaker Identification, Hartley Transform, 
Spectrogram 

I. Introduction (Heading 1) 

With extensive use of networks along with the computer 
systems, security of user's data has become a major issue in 
today's era. Various security measures are successfully used. 
Many more are under development and research to provide 
trustworthy systems. Biometrics is one of such security 
providing technique, wherein user is authenticated before 
allowing access to the information or resources. Face 
recognition, fingerprint recognition, iris recognition, palm 
print recognition, speaker identification are some the well 
known biometric techniques. All these techniques are using 
some physical or behavioral characteristic of human being 
and have its own pros and cons. Speaker identification is 
identifying a speaker based on the unique characteristics 
present in the speech signals [1]. In this paper, closed set text 
dependent speaker identification is considered. In the 
proposed method, speaker identification is carried out with 
spectrograms and Hartley transformation technique. This 
work is an extension to our previous work of Speaker 
Identification using the transforms such as DCT, WALSH 
and HAAR [2], [3], [4], [5]. Similar to these techniques, 
selection of cropped or partial feature vector is tried in the 
presented work using Hartley, to observe its effect on 
accuracy of the system. Selecting appropriate area for 
cropping the feature vector has been found interesting in 
Speaker Identification using Spectrogram and various 
transformation techniques. 



The rest of the paper is organized as follows: In section 2 
we present related work carried out in the field of speaker 
identification. In section 3, proposed approach and 
experimental work has been presented. Results are tabulated 
in section 4. Conclusion has been outlined in section 5. 

II. Related Work 

There are many techniques used to parametrically 
represent a voice signal for speaker recognition task. Mel 
Frequency Cepstrum Coefficient is the most popular one. 
Davis and Mermelste in [6] have described the energy 
distribution of speech signal in a frequency field. Wang 
Yutai et. al. [7] has proposed a speaker recognition system 
based on dynamic MFCC parameters. This technique 
combines the speaker information obtained by MFCC with 
the pitch to dynamically construct a set of the Mel-filters. 
These Mel-filters are further used to extract the dynamic 
MFCC parameters which represent characteristics of 
speaker's identity. Another histogram based technique has 
been proposed by Sleit, Serhan and Nemir [8]. This 
histogram based speaker identification technique uses a 
reduced set of features generated using MFCC method. For 
these features, histograms are created using predefined 
interval length. These histograms are generated first for all 
data in feature set for every speaker. In second approach, 
histograms are generated for each feature column in feature 
set of each speaker. Vector Quantization (VQ) is yet another 
approach of feature extraction [9], [10], [11], [12]. In Vector 
Quantization based speaker recognition systems; each 
speaker is characterized with several prototypes known as 
code vectors [13]. Speaker recognition based on non- 
parametric vector quantization was proposed by Pati and 
Prasanna [14]. Another widely used method for feature 
extraction is use of linear Prediction Coefficients (LPC). 
LPCs capture the information about short time spectral 
envelope of speech. LPCs represent important speech 
characteristics such as formant speech frequency and 
bandwidth [15]. 
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III. Proposed Technique 



Before proceeding with the proposed algorithm, we 
would like to elaborate some observations on Hartley 
transform and its energy distribution. 

As we know, DCT has the tendency to concentrate 
energy of an image or entropy of an image at the top left 
corner of an image. This makes us possible to crop the 
feature vector of an image as shown in Fig.l. 



nil Ifr 



Figure 4. 



Cropping of feature vector when Hartley transform is applied to 
an image 



J 





Figure 1 . Cropping of feature vector in case of DCT on image 

The outermost rectangle represents the feature vector 
without cropping. As we move towards the upper left corner 
we get cropped feature vectors of smaller sizes. 

This however does not hold true for Hartley transform. If 
we crop the feature vector of Hartley transformed image to 
get the top left portion as feature vector, we end with a 
reduced accuracy for Speaker Identification. 

Reason behind this is where Hartley transform tends to 
concentrate the entropy of an image. If we see the plot of 
energy of an image using DCT and Hartley respectively, we 
can easily find out the difference between energy 
concentrations areas in both the cases. In case of DCT, it is 
top left corner of an image as shown in Fig. 2. 




Figure 2. Energy distribution for DCT on one of the spectrogram from 
dataset 

However, for Hartley, energy of an image is observed to 
be concentrated in four corners of an image as shown in Fig. 

3. 




Figure 3. 



Energy distribution for Hartley on one of the spectrogram from 
dataset 



Hence cropping the feature vector from Hartley 
transformed image makes sense to select the coefficients 
from four corners of a transformed image unlike DCT as 
shown in Fig. 4. The outer rectangle shows the full feature 
vector of an image. The rectangles with same color, at four 
corners show the cropped feature vectors. These four corners 
are then appended to form a cropped feature vector of an 
image. Fig. 5 shows energy distribution of such appended 
four corners. 



Figure 5. Energy distribution of cropped feature vector obtained by 
appending the four corners of Hartley transformed image 

We now present the proposed technique after this 
sufficient elaboration. 

The experimental work has been continued with the 
database of speakers used in our previous work [2], [3], [4], 
[5]. 

This database contains speech sentences recorded for 30 
speakers. Each speaker has been recorded with six different 
sentences uttered at different times. Ten occurrences of each 
sentence were recorded in order to have sufficient training 
and testing set. In all, this results into collection of 1800 
speech samples. The next step in the work is to convert these 
samples of continuous signals into image dataset by creating 
spectrogram from it preceded by some pre-processing. 
Hartley transform is then applied to this database containing 
1 800 spectrograms. 

From the ten spectrograms for each speaker and for each 
sentence, eight spectrograms are randomly selected for 
training the system. Remaining two spectrograms are used 
for testing the accuracy of system. Thus training set consists 
of 1440 spectrograms and test set consists of 360 
spectrograms. Algorithmic steps of the work carried out on 
these trainee images are given below: 

Step 1: Resize the input trainee image to size 256*256. 

Step 2: Apply Hartley transform to this resized image. 

Step 3: Select four corners of size 2*2 each from the output 
obtained in step 2 and append them to form cropped 
feature vector of size 4*4. 

Repeat steps 1 to 3 for each spectrogram in training set 
and also for various corner sizes such as 4, 8,10,16,32 and 
64. 

Similarly, for test images, following steps are performed. 

Step 1: Resize the input test image to size 256*256. 

Step 2: Apply Hartley transform to this resized image. 

Step 3: Select four corners of size N*N each from the 
output obtained in step 2 and append them to form 
cropped feature vector of size 2N*2N. 
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Step 4: Calculate the Euclidean Distance between test 
image and each of the trainee images of 
corresponding sentences. 

Step 5: Trainee image with minimum Euclidean Distance is 
the spectrogram of identified speaker. 

Repeat steps 1 to 5 for each spectrogram in test set and 
also for various corner sizes such as 2, 4, 8, 10, 12,16,20,32 
and 64. Calculate accuracy of the system by taking the 
average of accuracy obtained for each sentence. 

IV. Results 

Following Table I shows the accuracy obtained for 
various corner sizes cropped from the feature vector. These 
results are further categorized sentence wise. Average 
accuracy of all six sentences for a particular corner size gives 
the overall accuracy of the system. 

It can be observed that maximum accuracy up to 90% is 
obtained for sentence s4, s5 and s6 with cropped corners of 
size 12*12 and hence for cropped feature vector of size 

24*24. 



TABLE I. Accuracy of the Speaker Identification system 

for cropped feature of various sizes obtained by applying 

Hartley Transform 



Corner 
size 


SI 


S2 


S3 


S4 


S5 


S6 


Average 
Accuracy 


2*2 


53.33 


56.67 


53.33 


56.67 


53.33 


53.33 


54.44 


4*4 


70 


76.67 


73.33 


80 


75 


78.33 


75.55 


6*6 


80 


83.33 


88.33 


83.33 


86.67 


86.67 


85.05 


8*8 


85 


88.33 


86.67 


86.67 


91.67 


90 


88.05 


10*10 


86.67 


88.33 


85 


86.67 


90 


88.33 


87.5 


12*12 


86.67 


86.67 


86.67 


90 


90 


90 


88.34 


16*16 


88.33 


86.67 


85 


86.67 


85 


90 


86.94 


20*20 


88.33 


88.33 


85 


83.33 


85 


88.33 


86.39 


32*32 


80 


80 


78.33 


85 


81.67 


88.33 


82.22 


64*64 


78.33 


75 


80 


78.33 


81.67 


81.67 


79.17 



V. Conclusion 

From the experimental work carried out, it can be 
concluded that Hartley transform can be efficiently used for 
speaker identification similar to the DCT, Walsh and Haar 
transform. Accuracy of the system goes on increasing till a 
specific size of cropped feature vector. The computational 
complexity of Hartley transform is 2N 3 multiplications and 
2N 2 (N-1) additions, where N*N is the size of an image. In 
our case, image is of size 256*256. This complexity is 
similar to the computational complexity of DCT on 
spectrograms for speaker identification. The only difference 
lying between two techniques is the area from where feature 
vector is cropped. This shows that Hartley transform can be 
effectively used for biometric technique like Speaker 
Identification. 
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Abstract — Tracking is the process explicitly dedicated to estimate 
the path of the object as it moves along the region of scene in the 
image plane, hi other words it is a strategy to detect and track 
moving object through a sequence of frames. Here an attempt to 
enable the Normalized Cross-Correlation strategy for both 
matching and updating the template for tracking the object in an 
outdoor environment is made. The proposed method explores to 
consume extensive bench mark dataset. The evolved system 
critically exhibits the capability to track an object or multiple 
objects genuinely under varied illumination conditions. 
Evidently, the outcome reveals the worthiness of the proposed 
developed novel system. 

Keywords- Object tracking, Normalized Cross-Correlation, 
Frame difference, Template updating. 



I. 



INTRODUCTION 



Object tracking is basically an attention drawing 
mechanism. It is also a process of establishing the 
correspondence to the objects in sequence of frames. Perhaps 
it unearths many applications but important among them are in 
video surveillance, monitoring the traffic and as a vision to the 
robot. There is no dearth of relevant literature in tracking 
object emerged in a moderate scene. It could be possible 
through spatial or appearance based model. Secondly several 
processes are evolved from frequency sphere. Further too 
hybrid approaches are celebrating effective performance. 
There are several approaches for tracking object in a scene that 
are Point tracking, Kernel tracking and silhouette tracking. 
Template matching is sub-class of Kernel tracking [8]. Some 
of the factors make object tracking complex due to change in 
color and illumination, noise in the images, abrupt motion of 
the objects and computational aspects for real-time processing 
[8]. Today the prime research in computer vision algorithm is 
detection and tracking of object. One such application is 
analysis of traffic scene. Thus vehicle detection is important 
for civilians as well as military usage especially in aerial and 
usual traffic scene since vehicles are vital part of human life. 

This paper attempts to propose a system which tracks the 
object vigorously with the correlation between object and 
template. However it takes care of updating the template with 
the help of normalized cross -correlation. In order to emphasize 
the proposed process with the help of three building blocks, 
such as correlating the template and image is aspired which is 



motivated based on the correlation score. Secondly the frame 
differencing algorithm is employed to produce the motion 
regions. Finally, sub-images are cropped and stored via frames 
which are corresponding to motion regions. In the sequel, 
existing template will be correlated with sub-images and the 
best match will be replaced as a new template. This process is 
repeated for every fixed interval of frames. An experiment has 
been conducted exhaustively employing the benchmark 
datasets such as PETS 2001 (1, 2 and 3 clips) and VISOR( 
video for traffic surveillance clip) and their details are 
tabulated in the table- 1. 

TABLE I. Showing the Dataset 



Sr 

No 


Dataset 


#of 
frames 


Contents 


Camera 


1 


PETS 

2001 (1) 


2343 


Hum an, Cars 
and People 


Side fixed, 
Moving tree 


2 


PETS 

2001 (2) 


2240 


Hum an, Cars 
and People 


Top -Down fixed 


3 


PETS 

2001 (3) 


2688 


Hum an, Cars 
and People 


Side fixed 


3 


VISOR 


1495 


Hum an, Cars 


Side fixed 



This paper contents are arranged as follows. Section 2 deals 
with the related work. Section 3 emphasizes the proposed 
method. The experiment and results are discussed in section 4. 
Conclusion and future work portrayed in section 5. 

II. COMPREHENSION OF RELATED WORK 

In the work of [4] J.P. Lewis et al. encourages the 
potentiality of normalized cross -correlation based template 
matching in the spatial domain. 

The author [1] Alan J. Lip ton et al. attempted to employ 
the combination of frame differencing and template matching 
to highlight the object in a scene. The template matching is 
guided by temporal differencing and image based correlation 
to make tracking process robust. Further the Impulse Response 
filter (IIR) is used to update the template, in other words it is 
known as adaptive template matching method. Researcher 
rlieu T. Nguyen et al. [2] tried to comprehend the tracking 
process for a rigid object through Kalman filter and 
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consequently updating the template to adapt changing 
illumination and orientation of the object is achieved via an 
adaptive Kalman filter. 

In the work of [6] Longin Jan Latecki et al. proposed 
strategy which is based on selective hypothesis tracking 
algorithm It includes the motion regions, image alignment and 
minimum cost estimation to update the template dynamically. 
In other words minimum cost matching is established through 
association between the motion region and the aligned 
template. Thus motion vector is updated 

Dynamic template matching and controlling the field of 
view of camera by PTZ was remarked by [5] Karan Gupta et 
al. using frame difference approach and choosing the proper 
threshold. This strategy basically tries to consider the instant 
updating the template although limited to a single object in a 
scene. 

In the work of Xue mei et al. [9] used the probabilistic 
algorithm for tracking, which included template matching and 
incremental subspace update. The templates are modeled using 
mixed probabilities and updated based on considerably 
changes of the object appearance. The augmentation of the 
Kernel Gream matrix with a row and column yields the 
updation. 

Jiyan pan et al. [11] gradual shifting away from the 
template in object tracking concept is well addressed through 
the template drift. In this work it is observed carefully that 
where template drifts occurs and consequently the template is 
updated. Kalman Appearance Filter [11] employed to update 
the template. 

Wenhui Iiao et al. [10] introduced a new method called 
Case Based Reasoning (CBR) to maintain accurate template of 
object automatically. In other words algorithm dynamically 
updates the case base (template). With this, real time face 
tracking is built to track the face robustly under different 
orientations and conditions . 



The literature surveyed till this point has encouraged us to 
propose a system based on normalized cross-correlation to 
track the object and update the template. 

Hence, we are proposing the template updating task with 
the combination of frame difference and normalized cross - 
correlation approach as a novel strategy. Perhaps it is expected 
to yield best possible outcomes. In other words this work tries 
to concentrate on the hybrid model for updating the template. 
Further proposed work ensures the tracking of single and 
multiple objects in a scene. On the other hand projected 
system addresses the limitations observed in the literature. 



HI. PROJECTED PROCESS 

This section the dedicated to present a proposed work 
and aims to track the object and update the template. The 
simplified block diagram of a general system is shown in 
Figure. 1 



Pre-process the 

Acquired image and 

set Count =0 



Compute cross- 
correlation of image 
andtemplate 



Template, 
Count =0 



Localizethe object 

& advance the 

Count 



Cross-Correlate the 

Sub-images and 

Template to renew 

Template 




Crop the Sub- 
images 
corresponding to 
moving objects 
detected by frame 
difference 



Figure 1 :Proposed system 



The computation of normalized cross -correlation involves 
through the following mathematical expression displayed in 
equations 1, 2 and 3. Subsequently determine the location 
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where the maximum value of correlation score occurs and 
corresponding location is the best match. Thus it gives the 
evidence to put the bounding box over the object. 



d f.t ( u >v) = Z x . v \f(x,y) -t(x-u,y- r)] 2 



(1) 



Where f (x, y) - is the image, t(x-u, y-v) is template 
positioned upon u & v. 

_ *-• 
0-f ^ (u ,v) is a squared Euclidean distance and summation is 

done over x and y. 

d| t (u t v)= 

2xy[ f 2 ( x j) - 2f(xj)t(x - U,y - V) + t 2 (x - u, y - v)] 

If the terms 2i" \P^t y) and £, t ( x_ u, y-v) are treated as 

constants. The approximate equation called as cross - 
correlation is. 

c(u, v) = 2 x . y f (x, y) t(x - u,y - v) (2 ) 

It is used as a measuring unit of similarity between the image 
and the template. 

The difficulties are noticed such as image energy which causes 
correlation score minimum, sorting of C(u,v) depends on 
template size, change in illuminations not affecting the 
equation (2) are eliminated through a process of 
normalization. Therefore the normalized cross -correlation (y) 
expressed through equation 3 as follows. 
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moving objects. In the sequel the centroids of moving objects 
are estimated. 

The proposed strategy encourages to mound the cropped 
sub-images with the help of centroid followed by process of 
computation of cross-correlation score between the template 
and sub-images. Therefore the best match will be the new 
template and process of updating is repeated for k- interval of 
frames. As it is empirically observed by the proposed 
experiment, the value of the k reflects with the dataset. It is 
portrayed in the plot shown in figure 2. This entire process is 
illustrated in subsequent section through two phases. First 
algorithm is predominantly exhibit object tracking task and 
second one dedicates to update the template in turn which 
supports and provide enhanced knowledge to track the object 
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Where f, JiV and t are means of image and template 
respectively. 

Further the necessity of template updating as we 
discussed and same is achieved through the equation 4. In 
order to obtain the absolute value of moving object by frame 
differencing below equation is exploited. 

D=l (fm)-(fm-l)l 



P(U) = { f(U) D > T 
{ D<T 



(4) 



In view of obtaining the binary form from the 
difference image by selecting suitable threshold and post- 
process it in the later stage using the morphological 
operations. Then, connected component helps to label the 



Algorithm-I 

1. Renovate input video into frames. 

2. To obtain noise free frame, median filter is employed. 

3. Initialize with template. 

4. Read the / ' frame and the template, compute the 
correlation score. Put the bounding boxoverthe object 
for the best match. 

5. Generating and updating the template after every fixed 
interval of frames using Algorithm-II . 

6. Step 4 and 5 are repeated for n frames 

Algorithm-II 
Generate and update the template after every fixed 
interval 

1. Initialization of count through A:- interval of frames. 

2. Get absolute value by subtracting m" frame from (m-1)' 

frame. 

3. Using threshold, the difference image is converted to 
binary form 

4. The moving objects are labeled using connected 
component analysis. 

5. Determine the Centroids of moving objects. 

6. Cropped sub-images corresponding to centroids are 
stored. 

7. Declare a new template using correlation between the 
template and the sub -images 

IV RESULTS AND EXPERIMENTS 

We have conducted experiments to corroborate the 
performance efficacy of the normalized cross -correlation 
approach. The computational aspects of the evolved method 
turn out to be polynomial and its order is O (n ). The same is 
tested over the available machine Pentium(R) Dual-core CPU, 
T4200 @ 2.00 GHz and 2.83 GB of RAM of 1.20 GHz. 

The experiment is conducted on the PETS 2001 (Video 
clips 1,2 and 3 clips) and VISOR video dataset (Video for 
traffic surveillance clip) for the different bunch of frames 
which includes different objects. Individual objects are tracked 
using respective templates, few of them are selected to 
experiment are tabulated in the TABLE VI. Single object as 
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small as 50 pixels is tracked efficiently. Template update is 
done empirically for every k frames which yields better 
performance. In the experiment environment k predominantly 
represents template updating at interval of frames and also 
known as updating frequency. This is summarized further 
through the TABLE II to TABLE V and Figure 2. Here we 
have noticed some of the interesting observations which made 
us keen upon further exploration in the future work 

It is observed that updating template at every 
alternate frame becomes computationally expensive. On the 
other hand updating after many frames will fail the tracking. 
Hence it is empirically chosen a suitable update frequency as k 
because of the stability. It is also further noticed by 
experimentation, that the tracking performance is directly 
proportional to the size of template. In other words larger the 
template, tracking is better. The proposed systemhas robustly 
performed over the different set of frames comprising 
different objects and varied illumination conditions. It is 
revealed in the TABLE VI that displayed the mis- tracking 
rate is minimal. The tracking results can be observed from the 
Figure 5 (a), (b), (c) and (d) are human, car (dark), car (white) 
and people respectively 



TABLE IV. Showing the effect of template updating upon tracking forPET S 
2001 (3) with 52 frames. 
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TABLE II. Showing the effect of template updating upon tracking for 
PETS2001 (1) with 52 frames. 



TABLE V. Showing the effect of template updating upon tracking for VISOR 
with 23 frames. 
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TABLE HI. Showing the effect of template updating upon tracking for 
PET S2001 (2) with 70 frames. 
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Template update frequency v/s Dataset 
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Figure- 2: Shows the variations of update frequency with 
Dataset 



109 



http://sites.google.com/site/ijcsis/ 
ISSN 1947-5500 



(IJCSIS) International Journal of Computer Science and Information Security, 

Vol. 10, No. 02, February 2012 







Frame-l 



Frame-100 




Frame-l 
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Figure 3 (a) 



Figure 4 (a) 





Frame-5 Frame-99 

Figure 3 (b) 
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Figure 4 (b) 
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Figure 3 (c) 
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Frame-69 



Figure 4 (c) 





Frame-2 



Frame -68 



Figure 3 (d) 



Figure 3 PETS 2001 (1) 





Frame-7 Frame-94 

Figure 4 (d) 

Figure 4 PETS 2001 (2) 
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Figure 5 (a) 



Frame-540 
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Figure 5 (b) 





Frame-10 Frame-60 

Figure 6 (a) 





Frame-8 Frame-24 

Figure 6 (b) 




Frame- 830 



Frame- 875 



Figure 5 (c) 





Frame- 340 



Frame-430 



Figure 5 (d) 
Figure 5 PET S2001 (3) 




Frame-1 3 




Frame-28 



Figure 6 (d) 
Figure 6 VISOR (video for traffic surveillance) 
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Table VI- Object tracking on PETS 2001(3) video. The details like type and 
number of Objects and number of frames and tracking and mis-tracking. [3] http://www.cvg.cs.rdg.ac.uk/PETS2001/pets2001-datasetJitml 
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3 
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45 
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V CONCLUSION AND FUTURE WORK 



In this paper it is established through normalized cross - 
correlation feature to track multiple objects. This procedure 
being able to track object as small 50 pixels and update 
frequency is empirically decided as k frames. It is observed 
that larger the template, tracking is better on the contrary poor 
tracking. Experimental results on PETS 2001 and VISOR 
video dataset reveal that the approach is capable of spotting 
and tracking the object correctly. The future work can be 
focused to track the object for different set of videos and 
handle the partial and full occlusions. Hence many future 
avenues can be thought of based on the success reported in 
this paper. 
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Abstract — The main objective of higher education is to provide 
quality education to students. One way to achieve highest level 
of quality in higher education system is by discovering 
knowledge for prediction regarding enrolment of students in a 
course. This paper presents a data mining project to generate 
predictive models for student retention management. Given 
new records of incoming students, these predictive models can 
produce short accurate prediction lists identifying students who 
tend to need the support from the student retention program 
most. This paper examines the quality of the predictive models 
generated by the machine learning algorithms. The results 
show that some of the machines learning algorithms are able to 
establish effective predictive models from the existing student 
retention data. 

Keywords — Data Mining, Machine Learning Algorithms, 
Retention Management, Predictive Models 

I. Introduction 

Student retention is a challenging task in higher education 
[1] and it is reported that about one fourth of students 
dropped college after their first year [1-3], Recent study 
results show that intervention programs can have significant 
effects on retention, especially for the first year. To 
effectively utilize the limited support resources for the 
intervention programs, it is desirable to identify in advance 
students who tend to need the support most. In this paper, we 
describe the experiments and the results from a data mining 
techniques for the students of MCA department to assist the 
student retention program on campus. The development of 
machine learning algorithms in recent years has enabled a 
large number of successful data mining projects in various 
application domains in science, engineering, and business [4, 
5]. In our study, we apply machine learning algorithms to 
analyze and extract information from existing student data to 
establish predictive models. The predictive models are then 
used to identify among new incoming first year students 
those who are most likely to benefit from the support of the 
student retention program. 

Prediction models that include all personal, social, 
psychological and other environmental variables are 
necessitated for the effective prediction of the retention rate 
of the students. The retention of students with high accuracy 
is beneficial for identify the students with low academic 
achievements initially. It is required that the identified 
students can be assisted more by the teacher so that their 
performance is improved in future. 

In this connection, the objectives of the present 
investigation were framed so as to assist the low academic 
achievers in higher education and they are: 



a) 
b) 

c) 



d) 



Generation of a data source of predictive variables. 

Identification of different factors, which affects a 

student's retention rate. 

Construction of a prediction model using different 

classification data mining techniques on the basis of 

identified predictive variables, and 

Validation of the developed model for higher 

education students studying in Indian Universities 

or Institutions. 

II. BACKGROUND AND RELATED WORK 

The most commonly cited model of retention studies is 
one developed by Tinto [2]. According to Tinto's Model, 
withdrawal process depends on how students interact with 
the social and academic environment of the institution. 

Kember [6] describes in an Open distance learning 
context, researchers tend to place more emphasis on the 
influence of external environment, such as student's 
occupation and support from their family, while the concept 
of social integration into an Open distance learning 
institution's cultural fabric, is given less weight. 

A number of Open Distance Learning institutions have 
carried out dropout studies. Some notable studies have been 
undertaken by the British Open University (Ashby [7]; 
Kennedy & Powell [8]). Different models have been used by 
these researchers to describe the factors found to influence 
student achievement, course completion rates, and 
withdrawal, along with the relationships between variable 
factors. 

Pandey and Pal [9] conducted study on the student 
performance based by selecting 600 students from different 
colleges of Dr. R. M. L. Awadh University, Faizabad, India. 
By means of Bayes Classification on category, language and 
background qualification, it was found that whether new 
comer students will performer or not. 

Hijazi and Naqvi [10] conducted as study on the student 
performance by selecting a sample of 300 students (225 
males, 75 females). The hypothesis that was stated as 
"Student's attitude towards attendance in class, hours spent 
in study on daily basis after college, students' family income, 
students' mother's age and mother's education are 
significantly related with student performance" was framed. 
By means of simple linear regression analysis, it was found 
that the factors like mother's education and student's family 
income were highly correlated with the student academic 
performance. 
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Khan [11] conducted a performance study on 400 
students with a main objective to establish the prognostic 
value of different measures of cognition, personality and 
demographic variables for success at higher secondary level 
in science stream. It was found that girls with high socio- 
economic status had relatively higher academic achievement 
in science stream and boys with low socio-economic status 
had relatively higher academic achievement in general. 

Al-Radaideh, et al [12] applied a decision tree model to 
predict the final grade of students who studied the C++ 
course in Yarmouk University, Jordan in the year 2005. 
Three different classification methods namely ID3, C4.5, 
and the Nai'veBayes were used. The outcome of their results 
indicated that Decision Tree model had better prediction 
than other models. 

Pandey and Pal [13] conducted study on the student 
performance based by selecting 60 students from a degree 
college of Dr. R. M. L. Awadh University, Faizabad, India. 
By means of association rule they find the interestingness of 
student in opting class teaching language. 

Ayesha, Mustafa, Sattar and Khan [14] describe the use of 
k-means clustering algorithm to predict student's learning 
activities. The information generated after the 
implementation of data mining technique may be helpful for 
instructor as well as for students. 

Baradwaj and Pal [21] obtained the university students 
data like attendance, class test, seminar and assignment 
marks from the students' previous database, to predict the 
performance at the end of the semester. 

Bray [15], in his study on private tutoring and its 
implications, observed that the percentage of students 
receiving private tutoring in India was relatively higher than 
in Malaysia, Singapore, Japan, China and Sri Lanka. It was 
also observed that there was an enhancement of academic 
performance with the intensity of private tutoring and this 
variation of intensity of private tutoring depends on the 
collective factor namely socio-economic conditions. 

Bhardwaj and Pal [16] conducted study on the student 
performance based by selecting 300 students from 5 
different degree college conducting BCA course of Dr. R. M. 
L. Awadh University, Faizabad, India. By means of 
Bayesian classification method on 17 attributes, it was found 
that the factors like students' grade in senior secondary exam, 
living location, medium of teaching, mother's qualification, 
students other habit, family annual income and student's 
family status were highly correlated with the student 
academic performance. 

Yadav, Bhardwaj and Pal [17] obtained the university 
students data like attendance, class test, seminar and 
assignment marks from the students' database, to predict the 
performance at the end of the semester using three 
algorithms ID3, C4.5 and CART and shows that CART is 
the best algorithm for classification of data. 

III. DECISION TREE INTRODUCTION 

A decision tree is a flow-chart-like tree structure, where 
each internal node is denoted by rectangles, and leaf nodes 
are denoted by ovals. All internal nodes have two or more 
child nodes. All internal nodes contain splits, which test the 
value of an expression of the attributes. Arcs from an 
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internal node to its children are labeled with distinct 
outcomes of the test. Each leaf node has a class label 
associated with it. 



The decision tree classifier has two phases [4]: 

i) Growth phase or Build phase. 

ii) Pruning phase. 

The tree is built in the first phase by recursively splitting 
the training set based on local optimal criteria until all or 
most of the records belonging to each of the partitions 
bearing the same class label. The tree may overfit the data. 

The pruning phase handles the problem of over fitting the 
data in the decision tree. The prune phase generalizes the 
tree by removing the noise and outliers. The accuracy of the 
classification increases in the pruning phase. 

Pruning phase accesses only the fully grown tree. The 
growth phase requires multiple passes over the training data. 
The time needed for pruning the decision tree is very less 
compared to build the decision tree. 

A. ID3 (Iterative Dichotomise 3) 

This is a decision tree algorithm introduced in 1986 by 
Quinlan Ross [18]. It is based on Hunts algorithm. The tree 
is constructed in two phases. The two phases are tree 
building and pruning. 

ID3 uses information gain measure to choose the splitting 
attribute. It only accepts categorical attributes in building a 
tree model. It does not give accurate result when there is 
noise. To remove the noise pre-processing technique has to 
be used. 

To build decision tree, information gain is calculated for 
each and every attribute and select the attribute with the 
highest information gain to designate as a root node. Label 
the attribute as a root node and the possible values of the 
attribute are represented as arcs. Then all possible outcome 
instances are tested to check whether they are falling under 
the same class or not. If all the instances are falling under the 
same class, the node is represented with single class name, 
otherwise choose the splitting attribute to classify the 
instances. 

Continuous attributes can be handled using the ID3 
algorithm by discretizing or directly, by considering the 
values to find the best split point by taking a threshold on the 
attribute values. ID3 does not support pruning. 

B. C4.5 

This algorithm is a successor to ID3 developed by 
Quinlan Ross [18]. It is also based on Hunt's algorithm. C4. 5 
handles both categorical and continuous attributes to build a 
decision tree. In order to handle continuous attributes, C4.5 
splits the attribute values into two partitions based on the 
selected threshold such that all the values above the 
threshold as one child and the remaining as another child. It 
also handles missing attribute values. C4.5 uses Gain Ratio 
as an attribute selection measure to build a decision tree. It 
removes the biasness of information gain when there are 
many outcome values of an attribute. 

At first, calculate the gain ratio of each attribute. The root 
node will be the attribute whose gain ratio is maximum. 
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C4.5 uses pessimistic pruning to remove unnecessary 
branches in the decision tree to improve the accuracy of 
classification. 

C. ADT (Alternating Decision Tree) 

ADTrees were introduced by Yoav Freund and Llew 
Mason [19]. However, the algorithm as presented had 
several typographical errors. Clarifications and 
optimizations were later presented by Bernhard Pfahringer, 
Geoffrey Holmes and Richard Kirkby [20]. 

An alternating decision tree consists of decision nodes 
and prediction nodes. Decision nodes specify a predicate 
condition. Prediction nodes contain a single number. 
ADTrees always have prediction nodes as both root and 
leaves. An instance is classified by an ADTree by following 
all paths for which all decision nodes are true and summing 
any prediction nodes that are traversed. This is different 
from binary classification trees such as CART 
(Classification and regression tree) or C4.5 in which an 
instance follows only one path through the tree. 

The original authors list three potential levels of 
interpretation for the set of attributes identified by an 
ADTree: 

• Individual nodes can be evaluated for their own 
predictive ability. 

• Sets of nodes on the same path may be interpreted 
as having a joint effect 

• The tree can be interpreted as a whole. 

IV. Data Mining Process 

Knowing the reasons for dropout of student can help the 
teachers and administrators to take necessary actions so that 
the success percentage can be improved. The data is 
collected from Department of MCA of V. B. S. Purvanchal 
University, Jaunpur, India. The raw data set is a collection of 
432 records accumulated over a period of twelve years 
regarding the basic information of first year students and 
whether they continued to enroll after the first year. The 
MCA department has been started in the year 1997 and 12 
batches have completed their study. In the raw data set, 398 
of the students continued to enroll after their first year while 
34 of them dropped out by the end of the first year. 

A. Data Preparations 

Data of 432 students of the Department of MCA, VBS 
Purvanchal University, Jaunpur is collected who get 
admission from 1997-2000 batch to 2009-2012 batch. The 
data was collected through the enrolment form filled by the 
student at the time of admission. The student enter their 
demographic data (category, gender etc), past performance 
data (SSC or 10th marks, HSC or 10 + 2 exam marks and 
Graduation Marks etc.), address and contact number. 

B. Data selection and transformation 

In this step only those fields were selected which were 
required for data mining. A few derived variables were 
selected. While some of the information for the variables 
was extracted from the database. All the predictor and 
response variables which were derived from the database are 
given in Table I for reference. 
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TABLE I: student related variables 



Variables 


Description 


Possible Values 


Sex 


Students Sex 


{Male, Female} 


Cat 


Students category 


{General, OBC, SC, 
ST} 


GSS 


Students grade in 
Senior Secondary 
education 


{O-90% -100%, 
A - 80% - 89%, 
B - 70% - 79%, 
C - 60% - 69%, 
D - 50% - 59%, 
E - 40% - 49%, 
F-<40%} 


GMSS 


Students grade in Math 
at Senior Secondary 
education 


{O-90% -100%, 
A - 80% - 89%, 
B - 70% - 79%, 
C - 60% - 69%, 
D - 50% - 59%, 
E - 40% - 49%, 
F - < 40%, Not 
Applicable } 


GS 


Graduation Stream 


{BA with Math, B.A. 
without Math, BSc. 
With Math, B.Sc. 
without Math, B.Com, 
BCA, BBA, B.Tech} 


GOG 


Grade obtained in 
Graduation 


{First > 60% 
Second > 45 & <60% 
Third > 36 &< 45%} 


MED 


Medium of Teaching 
in Graduation 


{Hindi, English, 
Regional } 


CL 


College Location 


{Rural, Urban} 


ATYPE 


Admission Type 


{UPSEE, Direct} 


RET 


Retention: Continue to 
enroll or not after one 
year 


{0,1} 



The domain values for some of the variables were defined 
for the present investigations are as follows: 

• Cat - From ancient time Indians are divided in many 
categories. These factors play a direct and indirect role in 
the daily lives including the education of young people. 
Admission process in India also includes different 
percentage of seats reserved for different categories. In 
terms of social status, the Indian population is grouped 
into four categories: General, Other Backward Class 
(OBC), Scheduled Castes (SC) and Scheduled Tribes 
(ST). Possible values are General, OBC, SC and ST. 

• GSS - Students grade in Senior Secondary education. 
Students who are in state board appear for five subjects 
each carry 100 marks. Grade are assigned to all students 
using following mapping O - 90% to 100%, A - 80% - 
89%, B - 70% - 79%, C - 60% - 69%, D - 50% - 59%, 
E - 40% - 49%, and F - < 40%. 

• GMSS - Student Grade in Mathematics at Senior 
Secondary education. Grade in mathematics at 10+2 
level are assigned to all students using following 
mapping O - 90% to 100%, A - 80% - 89%, B - 70% - 
79%, C - 60% - 69%, D - 50% - 59%, E - 40% - 49%, 
and F - < 40%. If student has not the mathematics at 10 + 
2 level then assign Not-Applicable. 

• GS - Graduation Stream. MCA admission is open for all 
stream students, therefore, Graduation Stream is split 
into following classes BA with Math, B.A. without 
Math, B.Sc. with Math, B.Sc. without Math, B.Com, 
BCA, BBA, B.Tech. 

• GOG - Grade Obtained in Graduation. Marks/Grade 
obtained in graduation. It is also split into four class 
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values: First - >60% 
- >36% and < 45%. 



(IJCSIS) 



Second - >45% and <60%, Third 



• Med - This paper study covers only the degree colleges 
and institutions of Uttar Pradesh state of India. Here, 
medium of instructions are Hindi or English or Regional. 

• ATYPE - The admission type which may be through 
Uttar Pradesh State Entrance Examination (UPSEE) or 
direct admission through University procedure. 

• RET - Retention. Whether the student continue or not 
after one year. Possible values are 1 if student continues 
study and if student dropped the study after one year. 

A. Implementation of Mining Model 

Weka is open source software that implements a large 
collection of machine leaning algorithms and is widely used 
in data mining applications. From the above data, ret. arff file 
was created. This file was loaded into WEKA explorer. The 
classify panel enables the user to apply classification and 
regression algorithms to the resulting dataset, to estimate the 
accuracy of the resulting predictive model, and to visualize 
erroneous predictions, or the model itself. There are 16 
decision tree algorithms like ID3, J48, ADT etc. 
implemented in WEKA. The algorithm used for 
classification is ID3, C4.5 and ADT. Under the "Test 
options", the 10-fold cross-validation is selected as our 
evaluation approach. Since there is no separate evaluation 
data set, this is necessary to get a reasonable idea of 
accuracy of the generated model. The model is generated in 
the form of decision tree. These predictive models provide 
ways to predict whether a new student will continue to enroll 
or not after one year. 

B. Results and Discussion 

The three decision trees as examples of predictive models 
obtained from the retention data set by three machine 
learning algorithms: the ID3 decision tree algorithm, the C4.5 
decision tree algorithm and the alternative decision tree 
(ADT) algorithm. For example, consider a new case with a 
student having graduation with B.Sc. (GS= B.Sc. with Math), 
and graduation grade is First (GOG = First), category is 
General (Cat = General) and get admission from UPSEE 
(ATYPE = UPSEE). For both the ID3 decision tree and the 
C4.5 decision tree, we need to start from the root to find a 
unique path leading to a prediction leaf node. In both trees, 
we find a unique path of length 1 immediately leading us 
from the root to a leaf node labeled 1, predicting continued 
enrollment the next year. 



0.506 

1)GSS - A: -1.261 
:i)C55 != A: 0.389 

2)HED = Hindi: 0.263 
(4]G0G - Second: 0.433 
I (7JGMSS = A: -0.512 
I (7)GMSS i= A: 0.476 
I I (10)GSS - A: -0.372 
I I (10)GSS !- A: 0.564 
(4]G0G !- Second: -0.485 
I (8)GSS - C: 0.438 
I (8)GSS != C: -0.614 

2)MED - English: -0.372 
(3)GHSS = C: -0.756 
(3]GHSS != C: 0.23 
I (5)GS = E.A. without maths: 
I I (9)Cat = 0EC: -0.412 
I I (9)Cat i= 0BC: 0.603 
I (5)GS != B.A. without maths: 
I I (6)GSS - 0: 0.533 
I I (6)GSS !- 0: -0.653 
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On the other hand, for the alternative decision tree (ADT 
tree) as shown in fig. 1, we may have multiple paths from the 
root to the leaves that are consistent with data and we need to 
sum up all the numbers appearing on these paths to see 
whether it is positive or negative. In this particular case, we 
find three paths leading from the root to leaves. Summing up 
all the numerical numbers appearing on these paths, we have 
a positive value 0.483+0.15-0.218+0.125+0.036=0.576, and 
that leads to the prediction of continued enrolment too. These 
decision trees also provide interesting insights into hidden 
patterns in the student retention data set. For example, both 
the ADT tree and the C4.5 decision tree show that Graduate 
Stream (GS) is a very relevant factor. 

The Table II shows the accuracy of ID3, C4.5 and 
CART algorithms for classification applied on the above 
data sets using 10-fold cross validation is observed as 
follows: 

Table II: Classifiers Accuracy 



Algorithm 


Correctly 
Classified Instances 


Incorrectly 
Classified Instances 


ID3 


72.093% 


11.627% 


C4.5 


74.416% 


25.581 % 


ADT 


72.093% 


27.907% 



Table II shows that a C4.5 technique has highest 
accuracy of 74.416% compared to other methods. ID3 and 
ADT algorithms also showed an acceptable level of 
accuracy. 

The Table III shows the time complexity in seconds of 
various classifiers to build the model for training data. 

Table III: Execution Time to Build the Model 



Algorithm 


Execution Time (Sec) 


ID3 


0.12 


C4.5 


0.08 


ADT 


0.06 



Table IV below shows the three machine learning 
algorithms that produce predictive models with the best 
precision values in our experiments, together with the 
corresponding recall values. For these algorithms, the best 
precision values (ranging from around 68.2% to 82.8%) are 
almost all accomplished when learning from the data set, 
with recall values ranging from 6.4% to 1 1.4%. 

The alternative decision tree (ADT) learning algorithm is 
the best precision performer we have seen so far, capable of 
reaching a precision rate of 82.8% and a recall rate of 1 1.4% 
without a sign of over-fitting. In other words, given a 
collection of 1000 new first year students with around 250 
would-be drop-out cases embedded in the list (assuming a 
drop-out rate of 25%), the ADT tree algorithm is likely to 
produce a list of around 37 students and among them about 
31 are actual would-be drop-out cases. 

Table IV: Classifiers Accuracy 



Figure 1 : AD Tree. 



Algorithm 


Precision values 


Recall values 


ID3 


68.2% 


06..4% 


C4.5 


70.4% 


09.6 % 


ADT 


82.8% 


11.4% 
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V. Conclusions 



One of the data mining techniques i.e., classification is an 
interesting topic to the researchers as it is accurately and 
efficiently classifies the data for knowledge discovery. 
Decision trees are so popular because they produce 
classification rules that are easy to interpret than other 
classification methods. Frequently used decision tree 
classifiers are studied and the experiments are conducted to 
find the best classifier for retention data to predict the 
student's drop-out possibility. Machine learning algorithms 
such as the alternative decision tree (ADT) learning 
algorithm can learn effective predictive models from the 
student retention data accumulated from the previous years. 
The empirical results show that we can produce short but 
accurate prediction list for the student retention purpose by 
applying the predictive models to the records of incoming 
new students. This study will also work to identify those 
students which needed special attention to reduce drop-out 
rate. 
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Abstract- Recent developments in information 
communication technology (ICT) have heightened 
the need of more study in this topic. There is a real 
risk of the acceptance of ICT by some and not 
others contributing to the rejection. The paper 
approaches the technology acceptance from the 
perspective of administration by examining the 
use of ICT and e-services in the public 
environment. The theoretical framework variables 
of the technology acceptance model (TAM) are 
examined. The study also investigated the effect of 
the model of organization readiness to change 
(MORC) Individual Differences "recipients' 
beliefs" as external variables, in addition 
subjective norm and volunteer as the mediating . 
the study tested the current usage as mediating 
variable between ICT believes and attitude 
behaviour. Most studies in ICT have been carried 
out in private sectors in Saudi Arabia. The survey 
instrument uses to collect the data is a self 
administrated questioner developed based on the 
technology acceptance questioner as used by 
Davis and Venkatesh. The research population is 
Saudi workers in public organization. The 
research tool is structure equation modelling 
(SEM), which required a minimum sample of 200 
respondents. 

Key words: Technology Acceptance model; 
Information Communication Technology; ICT 
Usage, Public organization; structural equation 
modelling; Saudi Arabia and developing countries. 



I. 



INTRODUCTION 



ICT is a critical means for achievement 
in private and public sectors together, but 
ensuring ICT acceptance is very difficult 
assignment for the organization given the 
barriers will face the project. However, 



this rapid change is having a serious effect 
on ICT project success rate and created 
problems threaten the organization 
existence [1]. The benefits and the 
problems associated with ICT 

implementation and adoption needs 
exploring. 

This study is an investigation of Saudi 
public worker behaviour. Besides that the 
study investigates the actual behaviour; 
worker behaviour and acceptance 
behaviour of Saudi workers who use ICT. 
Then, analyze the relationship between 
social demographics, worker behaviour, 
and acceptance behaviour of public 
workers. Lastly, this study also examines 
the intention to use and willingness to 
continue use ICT. 



II. 



RESEARCH PROBLEM 



In general, There is increasing concern 
that most organizations, despite its extent, 
have not been able to recognize the 
complete merits brought by ICTs [2]. In 
order to realize the full advantages of ICT 
solutions, organizations need to identify 
the factors affecting ICT acceptance [3] 
[4] [5]. In addition, the failure rate in the 
implementation of technology calls for a 
better understanding of the process [6] [7] 
[8]. 

The wide investment in technology 
infrastructure is support to ICT program. 
Yet; considering organizations are 
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investing in technology projects at an 
alarming rate, and the failure rate 
associated with this investment call for the 
need to investigate on this issue is even 
more critical Krigsman [9] affirms the 
worldwide cost of information technology 
projects failure over $6.2 trillion. Recent 
developments in public technology have 
heightened the need for study of the 
technology adoption factors. It is 
becoming increasingly difficult to ignore 
the factors cause the adoption failures in 
the public organization and the low 
production of the civil workers [10]. 

Although ICT acceptance is rarely the 
motivation for public workers in Saudi 
Arabia, it is an essential activity for many 
workers. However, little is known about 
public worker's behaviours and their 
preferences. This study will investigate 
acceptance factors of public workers and 
will profile worker preferences. 
Organizations that target ICT can benefit 
from the understanding of employee's 
behaviour, and can gain advantages over 
those organizations that are less 
knowledgeable about their user. 

Many researchers [11] [12] [13] [14] 
have argued research on how organizations 
manage problems associated with 
technology and e-services acceptance 
needs to be undertaken before the 
association between the factors affecting 
the technology acceptance, use and finally, 
the adoption of the e-services. A 
commonly observed phenomenon, in e- 
services acceptance and adoption in Saudi 
Arabia and developing countries, is that 
Saudis seems apprehensive to accept 
technology [15]. Some studies had an 
emphasis for the need of direct measure of 
the effect the social norm and culture on 
the acceptance and the adoption of e- 
transactions in governments organizations 
in Saudi Arabia [12] [16]. Richardson 
states one of the main streams of research 
is the explanation and prediction of ICT 
adoption in the developing countries [17]. 

Turner, Kitchenham, Brereton, 

Charters [18] and Budgen [18] said that 
TAM proposed in 1989 as a means of 
predicting technology usage, is the 
superlative tool that demonstrates 



technology acceptance [19]. Dasgupta, 
Granger and Teo and McGarry emphasis 
that many TAM studies generate diverse 
hint's base on the empirical facts, 
Inconsistent finding's overflow in terms of 
the direction and the scale of the 
assassination between TAM variables [20]. 
Other studies showed unreliable 
associations. Teo [21] and Ahmad et al,. 
[5] argued that using predicted use as an 
alternative of actual use of ICT is 
deteriorating TAM studies. 

Lee, Kazor and Larsen [22] and 
Yousafzai, Foxall and Pallister [23] 
conducted a meta-analysis of TAM and 
found that one of the major problems with 
TAM research was that scholars were 
performing replication studies that provide 
very little incremental advancement to the 
literature. Researchers were not really 
expanding TAM. Lee et al,. [22] noted that 
many scholars felt that the concept of a 
"cumulative tradition" was carried too far 
in all the repetitious studies of TAM, 
because the model had become an inhibitor 
of more advanced theories of ICT use. 

Acceptance of technology innovations 
for communication needs and factors that 
influence the acceptance, and adoption 
have been studied for decades. The 
theoretical frameworks that were used to 
inform the studies include the diffusion of 
innovation theory, the expectancy-value 
model, and the technology acceptance 
model. The word 'acceptance' has been 
used by different authors in different 
meanings and context. As a matter of fact, 
the expression does not have any unique or 
specific description in literature. TAM has 
defined acceptance as user's decision 
about how and when to use technology 
[24]. 

The theory of planned behaviour Ajzen 
[25] which developed out of the theory of 
reasoned action Ajzen and Fishbein [26]; 
Fishbein and Ajzen [27] [28], the model of 
readiness for organizational change 
(MROC) Holt, Armenakis, Feild and 
Harris [29] and TAM Davis, Bagozzi, and 
Warshaw [30] provide the foundation of 
the model which is integrated in this 
dissertation into the proposed theoretical 
model. 
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The study of people's behaviour to ICT 
application has been a vital issue in ICT 
study in ICT research since the 1980s. The 
theoretical foundation for the study of 
whether a person is willing to use a 
technology comes from research on 
adoption and diffusion [31]. Research in 
this area has continued to develop over the 
decades producing other theories including 
the theory of planned behaviour Mathieson 
[32] [33], social cognitive theory [34]and 
TAM [35] [24] [30] [36]. 

TPB has been used successfully for 
prediction purposes in various research 
areas, including the use of structured 
interview techniques for selection 
purposes, the forecast of managers' 
personal motivation to enhance their own 
proficiencies after receiving feedback, 
readiness for organizational change 
Jimmieson, Peach and White [37] 
technology adoption, intent toward 
participating in a participation program 
[38]. 

The stated purpose of TAM is to 
"provide an explanation of the 
determinants of computer acceptance that 
is general, capable of explaining user 
behaviour across a broad range of end-user 
computing technologies and user 
populations, while at the same time being 
both parsimonious and theoretically 
justified" [35]. It assumes rationality 
within the decision-making process. 
Studies have provided empirical support 
for TAM [39]. TAM also compares 
favourably with other technology 
acceptance theories [32]. Taylor and Todd 
[33] affirm that TAM customarily 
annotates about forty percent of the 
discrepancy in a persons' intentions to 
employ the ICT and the true usage of it. 

TAM proposes that the technology use 
is motivated by persons' attitude toward 
using the it, which is a function of their 
beliefs about using the technology and an 
evaluation of the value of actually using it. 
This stands on "the cost-benefit paradigm 
from behavioural decision theory" [24] 
p:321, which state that a person behaviour 
is based on his or her self-tradeoffs 
between the effort to performance a work 
and the cost of this action. Therefore, 



TAM emphasizes that human will use a 
technology if the reimbursement of doing 
so overshadowed the effort required to use 

it [24]. 

Among the behaviours commonly 
measured are: system usage [40], and user 
satisfaction [41]. Some researchers have 
studied both of these dimensions as a 
composite [42]. User satisfaction actually 
represents a cognitive and affective 
outcome that is less tangible in terms of 
classification as behaviour. Al-Gahtani and 
King [43] came up to the conclusion that 
ICT usage is an accurate measurement of 
ICT acceptance. 

The intention to use ICT depend on the 
behaviour of actual use of it, since 
individuals, perform as they planned to, so 
long as they have control over their 
actions. Sequentially, the attitude to use 
ICT applications depend on the 
behavioural intentions to use it. Following 
the logic of the TRA framework, users' 
attitudes are determined by beliefs about 
the system and about the consequences of 
using it. 

The model of readiness for 
organizational change suggests that 
(intended and unintended) behavioural 
outcomes are due to intentions (and 
reactions) concerning those behaviours. 
Researchers have previously argued that a 
positive and favourable view toward 
organizational change, based on the degree 
workers believe a change is likely to 
contain positive beneficial implications for 
them, and the organization will lead to 
better reactions to change [44]. In turn, 
these intentions and reactions are linked 
with the attitude called readiness for 
change, which has been defined in 
numerous ways [29]. This attitude is, in 
turn, believed to be due to various change- 
related beliefs. 

Several attempts have been made to 
define change recipients' beliefs [29]. In 
addition, these change recipients' beliefs 
are related to various antecedents that fit 
within the aforementioned typology. 
Subjective norms play a crucial role. The 
proposition that subjective norms help 
predict intentions relating to supporting 
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organizational change comes from the 
suggestion that collective influence will 
create a load among workers who directs 
them to support change. Researchers have 
suggested that practitioners should take 
advantage of the group culture in 
organizations as a device for creating roots 
and union that can influence and inform 
one another about a change in order to 
generate support and produce shared 
significance throughout the period of 
change [45]. 

In addition, there are three relevant 
behavioural beliefs originate from the 
literature that having an impact on the 
attitudes, subjective norms, and perceived 
behavioural control [25]. These beliefs 
are: (a) affect approaches towards the 
perception of behaviour, (b) regular 
assumptions that consist of the 
fundamental components of these non- 
objective standards, and (c) control faiths 
that provide the basis for perceived 
behavioural control. 

TPB suggests, subjective norm is the 
antecedent generally associated with social 
pressure. It reflects " the individual's 
perception of social support or opposition 
to his performance of the behaviour [27]." 
Normative beliefs and motivation are the 
two elements subjective norms. Normative 
beliefs are the individual's perceptions 
that certain people want them to perform 
the behaviour. The individual's 

compliance represents the relative 
importance of the referent person to the 
individual. This element of behavioural 
intentions is determined by the extent to 
which the individual believes others who 
are considered significant times the 
individual's desire to comply with the 
wishes and desires of those significant 
others who desire the behaviour. 

Lin and Lee [46] reported the results 
for a study which described the knowledge 
sharing behaviour of employees due to 
social pressure created by senior 
managers. The study found that opinions 
of peers were more likely to influence the 
decisions made by senior managers with 
respect to knowledge sharing behaviour. 
The study's results demonstrated that the 
main agents of overall company-wide 



knowledge sharing behaviour were through 
the boosting and expression of faith of 
senior managers. The positivity of their 
attitudes, through encouragement and 
perceived behavioural control greatly 
influenced intentions to advocate 
knowledge sharing. 

Perceived ease of use (PEU) is 
described as "the degree to which a person 
believes that using a particular system 
would be free of effort" [24] p:320. The 
construct reflects the amount of effort that 
would be required, relative to the people 
perceived capabilities, in terms of being 
able to use the technology to accomplish 
the intended functions. 

A theoretical model put forth by 
Venkatesh [40] found a number of control- 
intrinsic motivation-related and emotion- 
related determinants for PEU. Control was 
divided into perceptions of internal control 
(computer self-efficacy) and perceptions 
of external control (facilitating 

conditions). Intrinsic motivation was 
conceptualized as computer playfulness, 
while emotion was conceptualized as 
computer anxiety. Thus, computer self- 
efficacy, facilitating conditions, computer 
playfulness, and computer anxiety were 
system independent variables. 

The variables were examined, and were 
found to play a very important role in 
shaping PEU beliefs related to the new 
system. The influence of these 
determinants was reduced over time due to 
growing experience with the system. 
Venkatesh put forth that objective 
usability, perceptions of external control 
(facilitating conditions) over system use, 
and perceived enjoyment would have a 
stronger influence on PEU during 
continuance. 

Perceived usefulness (PU) has been 
defined as "the degree to which a person 
believes that using a particular system 
would enhance his/her job performance" 
[24] p:320. The construct reflects an 
employee's level of conviction that a 
particular system will increase their work 
performance [35]. The PEU and PU 
relationship may be reduced over time 
[47]. 



121 



http://sites.google.com/site/ijcsis/ 
ISSN 1947-5500 



(IJCSIS) International Journal of Computer Science and Information Security, 
Vol. 10, No. 2, February 2012 



The quality of the output, particularly 
the more precise and up-to-date the 
information provided, the greater the PU. 
In addition, the greater the ease of ICT 
accessibility, comprehension, and analysis, 
the greater is the PU [48]. Goodwin [49] 
opined that perceived usefulness depends 
on the usability and the counting use of the 
technology, represented by PEU. 
Mathieson [32] and Szajna [47] reported 
that PEU accounts for a significant portion 
of the variance in PU. In TAM II, Hence 
according to Venkatesh and Davis [36] 
"PU's significant antecedents have 
included subjective norm, image, job 
relevance, output quality, and result 
demonstrability." Li [50] discusses ICT 
adoption from the effects of the group. 
Herding occurs when an organization 
adopts an ICT based on a "me too" 
attitude. In many cases, the adoption of 
technology is in response to not being left 
behind, the "herding effect." The herding 
effect results when the first bureau adopts 
a technology, and subsequent users adopt 
the technology in order to minimize the 
risk of choosing an alternative technology. 
In situations of incompatible ICT about 
technologies, committing to a technology 
is more advantageous to the agency earlier 
rather than later, due to the commitment 
power when the choice is irreversible [51]. 

This herding behaviour may appear 
because of information flow, which occurs 
when individuals of sound minds begin to 
ignore their own findings and instead 
continue in the footsteps of previous 
decision makers [50]. In addition to 
informational cascading, Li [50] also notes 
that positive network feedback can cause 
leading technology to grow more 
dominant. They usually result in positive 
network externalities that make an ICT 
adopter's return positively correlated with 
the number of adopters who have already 
committed themselves to the same 
technology. Therefore, herding is rewarded 
by increasing the payoffs of those ICT 
adopters who associated themselves with 
the majority. 

As technology advances, organizations 
adopt newer tasks. This can result to the 
change of the nature of work in the 
organization. For instance, tasks that were 



done manually are now done automatically 
with the aid of machines. Most 
organizations increase the readiness to 
change because the nature of work is 
changing [52] [53] [54]. 

Madsen, Miller and John [55] define 
change as a transition from a stage to 
another, and that existing structures are 
broken down to create new ones. 
According to Armenakis et al,. [44], there 
are certain features of a positive work 
environment; which generally tend to 
include both workplace as well as 
individuals. These thus encourage positive 
behaviours and attitudes for the 
organizational readiness for change. 

One of the most significant current 
discussions in commitment to change is 
work-related attitudes and behaviours. 
Perceived risk and habit is an important 
component in the resistance to user 
resistance to a new technology [56]. In 
addition, Mowday, Steers and Porter [57] 
argued that a relationship exists between 
job nature and affective organization 
commitment, is described as the 
motivation and desire of an employee to 
not only continue working for an 
organization but also work to his full 
potential to help achieve the said 
organizations' goals [57] p:225. 

As already discussed, technological 
advancements lead to the need for change 
in the organization. Technology also 
influences the nature of work, for instance, 
from manual to automatic tasks. Adopting 
new technologies is mandatory for 
improved performance and retaining a 
competitive advantage for the organization 
[58]. However, the organizational 
readiness for change will depend on first, 
the availability of resources to adapt new 
technologies; and second the employees' 
ability to coexist with the introduced 
technology [59]. New technologies 
necessitate the need for knowledge, skills, 
and expertise on how to use them. When 
employees are unfamiliar with the new 
technology, intimidation may occur, and 
hence resistance towards change. 

The organizational readiness to change 
can therefore be achieved depending on 
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how the organizational people are 
introduced to the new technology. This 
requires training, which not only acts as an 
instrument towards empowering the people 
with the knowledge and skills for the 
technology, but also motivates employees 
in their work. Training is one way in 
which the needs of personal development 
for the employees are met, and in turn 
benefits the organization with improved 
performance. Therefore, training increase 
the organizational readiness for change, as 
the employees are empowered towards the 
acquisition of newer tasks. On the other 
hand, lack of training reduces the 
organizational readiness for change and 
instead increases resistance to change. 
Resistance to change in organizations at 
the moment is associated with reduced 
business development. It has been 
suggested that commitment to change is 
dependent of the job redesign and 
empowerment [60]. 

For performance improvement, a lot of 
key factors come into play, for instance 
the leadership style, the motivation that 
employees get, and how the goals and 
values of the organizational culture are 
implemented. This means that 

dissatisfaction in performance is greatly 
contributed by the perception that the 
employees have towards the organizational 
structure, and the management models. 
Supervisory models influence the effort 
that the employees are willing to put in the 
work for the benefit of the organization in 
terms of performance. Dissatisfaction in 
performance also results from external 
factors such as competition. The biggest 
problem often facing public organization 
when it comes to evaluation knows what to 
evaluate. It is much more important to 
measure outcomes rather than inputs or 
outputs [61]. Winslow and Bramer [62] 
state that depict a model for human 
performance where optimum performance 
lies in the middle of three intersecting 
circles of ability, context, and motivation. 
A considerable of the study done by Burke 
and Litwin [63] a strong relationship 
between performance and organization 
change has been confirmed, moreover, 
have argued that numerous studies [64]; 
[65] have attempted to explain the impact 
of reword, nature of work, the needs of an 



individual and the values he places on 
motivation as well as satisfaction with job 
on the work performance and organization 
change. 

With regard to computer usage, Straub 
[66] says that if an individual can 
effectively communicate electronically 
with the clients, it thus reflects a high 
level of capability on part of the 
individual. An effective employee, hence, 
believes that he or she can assess the 
usefulness of the computer-mediated work 
environment, thus bringing out positive 
changes in his or her behavioural intention 
and use of the technology. More 
importantly, employee's computer self- 
efficacy is determined by experience, 
observation, social persuasion, and 
affective arousal. Therefore, one's 
computer self efficacy, being changeable 
in nature, could be enhanced through 
intervention, which may include 
specifically designed training [67] [5]. 

The satisfaction on the new tool 
depends on the performance of this new 
instrument [68]. In a study done by Floh 
and Treiblmaier [69] identified that 
satisfaction which represented by the 
management performance is a very 
important attribute of technology adoption. 

Researchers tried to look into various 
elements linked to this and have developed 
outlooks in terms of the affective, 
cognitive and behavioural reactions that 
different people exhibit to technology 
along with outlining how different 
elements impact the individuals to produce 
these reactions. No theoretical framework 
has been more successful as TAM [35]. 

On the other hand, defence 
mechanisms are typically utilized without 
any active knowledge on part of the person 
in question, when confronted with danger 
to one's spirituality [70] [71], therefore, 
the relationships between the research 
variables were hypothesized Figure 1: 

• Hi: Attitude to change negatively and 
directly influences Behavioural 
intention (BI) to use. 
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• Hi a : Subjective norms mediate the 
relationship between Attitude to 
change and BI. 

• H lb . Perceived voluntariness mediate 
the relationship between Attitude to 
change and BI. 

• H2: Usage "performance" positively 
and directly influences the intention to 
change. 

• H2a: Training mediates the relationship 
between usage and attitude to change. 

• H21,: The nature of work mediates the 
relationship between usage and attitude 
to change. 

• H3: Usage mediates the relationship 
between Perceived usefulness and 
attitude. 

• H 4 : Perceived ease of use positively 
and directly influences Perceived 
usefulness. 

• H4 a: Usage mediates the relationship 
between Perceived ease of use and 
attitude. 

• H 5a Principal Support positively and 
directly influences perceived ease of 



H 5b: Principal Support positively and 



directly influences perceived 

usefulness. 

• H 6a: Valence negatively and directly 
influences perceived usefulness. 

• H 6b: Valence negatively and directly 
influences perceived ease of use. 

• H 7a: Appropriateness positively and 
directly influences perceived ease of 
use. 

• H 7b Appropriateness positively and 
directly influences perceived 
usefulness. 

ICT acceptance is rarely the motivation for 
public workers in Saudi Arabia; it is an 
essential activity for many workers. However, 
little is known about public worker's 
behaviours and their preferences to use ICT. 
This study will investigate acceptance factors 
of public worker and will profile worker 
preferences. Organizations that target ICT can 
benefit from the understanding of employee's 
behaviour, and can gain advantages over those 
organizations that are less knowledgeable 
about their user. 

The study aims to answer the main 
research question "what factors affect 
employee behaviour to accept and adopt ICT 
in the Saudi public industries?" and to what 
extend the performance affect the ICT 
acceptance and adoption. 
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Figure -1 The Research model 
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III. 



METHODOLOGY AND FRAMEWORK 



For this paper, the MROC presented by 
Holt and colleagues [29] was combined with 
components of TAM III [72], the third 
iteration of TAM [24]. The theoretical 
framework proposed specifies potential 
relationships among variables from both TAM 
and the model of organizational readiness of 
change with other factors from the literature 
subjective norm and volunteer Figure 1. For 
the theoretical model, the MROC serves as the 
template and technology acceptance variables 
are included into the model. The change- 
related beliefs chosen for this research consist 
of nine interrelated variables. The three beliefs 
referred to as the organizational change 
recipients' beliefs (OCRBs) include: 
appropriateness, "is this the right change"; 
principal support, "has everyone bought into 
making the change happen"; valence, "what is 
in it for me" and commitment to change [73] 
[53]. In addition to the three change-related 
beliefs, the four primary beliefs of TAM, 
seeming easiness in terms of the usage, 
seeming usability, are also taken into account. 

In addition to four factors which are nature 
of work, training, perceived voluntariness and 
subjective norm as moderators. These 
variables are not explained in this section 
since each one is focused on in greater detail 
in the sections that follow within this literature 
review. It is proposed that these beliefs are the 
result of sense making as it concerns any 
number of antecedents that could be related to 
the organizational change involving 
technology. 

Finally, current usage the discrepancy 
between the desired and the current 
performance levels can trigger the call for 
change in the organizations. Specifically, if 
the current performance is faced with 
perceived dissatisfaction, the organizational 
construct suggests that changes occur in the 
organization [59]. Yousafzai, et al,. [23] argue 
that the variance in the type of method, 
subject, technology and utilization are 
typically prone to moderating the links that 
have been hypothesized. 



IV. 



PARTICIPANTS AND PROCEDURES 



Participants were 757 employees who are 
working at public organization in Saudi 



Arabia. A letter of consent was obtained, and 
the questionnaires were issued. The 
participants in this study were 100% male, 
majorities 46% of them are graduated. The age 
group between 30- 34 is the majorities of the 
respondents. The survey would be highly 
influenced by supervisors with income 
between SR 6000 -7999. Most of the 
respondents have no training and almost 20% 
of the trainees got their training in the 
department for less than one week. 

Before issuing the questionnaire, a brief 
explanation (verbal introduction) about the 
purpose of visit, objective of the study and 
how to fill the questionnaire were all 
instructed to the employees. Questionnaire was 
translated to Arabic and back translated again 
to English. All the questionnaires were issued, 
filled and collected on the day of the visit or 
sent by mail. After that the questionnaires, 
data was typed in manually into excel file, 
then transfer to PASW 18.0 file format and 
coded. 

A. Data analysis 

The research at hand makes use of 
structural equation modelling (SEM), in order 
to construct a framework that showcases the 
links in terms of the four variables that have 
been included in this research. These variables 
include intent to use, behaviour towards using 
technology, seeming utility, and seeming 
easiness in terms of use. Data was gathered via 
a survey questionnaire which consisted of 
queries regarding respondents attribute and 
various things for every single one of the four 
variables included in the research. SEM was 
picked up for the research and not a regression 
analysis. This is because it as a parallel 
analysis running which allows an assessment 
of links in terms of the different variables 
included along with the errors and issues that 
every variable has, which are to be measured 
independently, and this is not possible with the 
regression analysis. 

AMOS (Analysis of MOment Structures) 
was used as the data analysis tool. AMOS is 
the more recent analysis package which is 
user-friendly graphical interface, and it has 
gained popularity as a much more easy manner 
of adding specifications to structure 
dimensions. AMOS additionally has an 
interface an alternate option known as BASIC 
[74]. 
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The sample was a convenience sample of at 
least 200 of civic employees who work in 
western Saudi Arabia. A self-administered 
survey was used to collect data at Medina, 
Jeddah, and Yanbu. The study was conducted 
at different organization and ministries offices 
to reduce bias. 

The instrument for data collection was 
developed based on the review of the literature 
on technology acceptance studies change 
behaviour and TAM I, TAM II and TAM III, 
Model of organization readiness of change. 

Steps which involved SEM are as follows: 
Data was checked to see whether there was 
anything missing or left over. After this, the 
data was validated and discriminate and 
convergent validities of the information at 
hand was developed. After this the problems 
associated with SEM itself were looked into. 
For instance, in order to make sure that 
normality existed within the data, [74] 
suggested the view of kurtosis indices along 
with skews shouldn't go beyond the value of 
10.0 and 3.0 respectively. For the results from 
SEM to be reliable it is important that the size 
of the sample be kept at 100-150 [75]. The 
size of the sample within the current study at 



hand has been kept at 429 in order to ensure 
that it fall within the specified requirements. 

B. Result 

The analyses within the current section are 
of a statistical nature and look into the 
descriptive statistics for the purpose of 
measuring objective and examining how valid 
the measurement is within the scope of the 
current study. Subsequent to this, the model is 
tested in terms of the various fit indices and 
previously developed hypotheses. 

C. Descriptive Statistics 

Table- 1 contains all constructs for 
descriptive statistics. The means lie above 
3.00; however, this is not true for the principle 
support which at 2.7 holds a different mean. 
Standard deviations lie within the range of 
0.77-1.24; this shows that the mean has a 
relatively narrow spread around it. In terms of 
the Skew we can see that the index lies 
between -0.6 - 2.0, along with which the 
Kurtosis also exhibited an index which ranged 
from -0.1 - 4.6. The data is normal in terms 
of the SEM as per the recommendations put 
forth by [74]. 



Table -1 Descriptive Statistics of the Study Constructs 



Variables 


Item 


Mean 


ST.D 


Skewness 


Kurtosis 


Principle support 


4 


2.68 


0.77 


-0.9 


0.1 


Motivation Valance 


4 


4.02 


1.03 


-0.6 


-0.4 


Appreciation 


5 


4.24 


0.99 


-1.3 


-0.9 


Perceive Ease of Use 


4 


4.22 


1.09 


-0.4 


-1.1 


Perceive Usefulness 


5 


4.05 


0.82 


-1.2 


1.6 


Attitude to change 


4 


3.70 


0.80 


-2.0 


5.5 


Subjective Norm 


2 


4.11 


0.83 


-2.0 


4.6 


Volunteer 


4 


3.81 


1.00 


-1.1 


1.3 


Behaviour Intention 


5 


3.90 


1.08 


-1.0 


0.4 



D. Convergent Validity 

In assessing for convergent validity of the 
measurement items, the item reliability of each 
measure, composite reliability of each constructs, 
and the average variance extracted are examined. 
The item reliability of an item was assessed by its 
factor loading onto the underlying construct. In this 
study, the composite reliability was used instead of 
the Cronbach's alpha because the latter tends to 
understate reliability [75]. For composite reliability 
to be adequate, a value of 0.70 and higher was 



recommended [76]. The third indicator of 
convergent validity, average variance extracted, is a 
measure of the overall amount of variance that is 
attributed to the construct in relation to the amount 
of variance attributable to measurement error [77]. 
Convergent validity is judged to be adequate when 
average variance extracted equals or exceeds 0.50. 
From Table-2, the average variance extracted and 
composite reliability met the recommended 
guidelines, indicating that the convergent validity 
for the proposed items and constructs in this study 
are adequate. 
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Table - 2 Results for the Measurement Model 



Latent 
Variables 


Item 


Factor 
loading 


Ave. Variance 
extracted (> .50)* 


Composite 
Reliability (> .70)* 


Principle support 


Psl 

Ps2 
Ps3 

Ps4 


0.7 
0.8 
0.7 
0.5 


0.6 


0.7 


Motivation Valance 


Mvl 

Mv2 
Mv3 
Mv4 


2.2 
0.6 
0.5 
0.5 


0.6 


0.7 


Appreciation 


Apl 
Ap2 
Ap3 
Ap4 
Ap5 


1.8 
0.8 
0.8 
0.7 
0.6 


0.7 


0.7 


Perceive Ease of Use 


PEUl 
PEU2 
PEU3 
PEU4 


2.3 
0.7 
0.5 
0.3 


0.7 


0.8 


Perceive Usefulness 


PU1 
PU2 
PU3 
PU4 
PU5 


2.5 
0.9 
0.6 
0.5 
0.3 


0.7 


0.7 


Attitude to Change 


Atl 
At2 
At3 
At4 


2.7 
0.7 
0.5 
0.3 


0.7 


0.8 


Subjective Norm 


Snl 
Sn2 


1.8 
0.9 


0.5 


0.9 


Volunteer 


Vll 
V12 
V13 
V14 


2.4 
0.7 
0.5 
0.3 


0.7 


0.8 


Behaviour Intention 


Bil 
Bi2 
Bi3 
Bi4 
Bi5 


2.3 
0.8 
0.8 
0.5 
0.3 


0.7 


0.7 



E. Test of the Measurement Model Fit 

The model of the research has been 
developed through the structural equation 
modal approach. The AMOS was used to 
create it [78]. Numerous different indices have 
been implemented in this research. Hair et al. 
[75] was of the view that utilizing fit indices 
that stemmed from different sets was a good 
idea. The ones used were absolute fit indices 



and gauged the extent of the inconsistency in 
terms of the observed and implied covariance 

matrices. The ( *) statistic is used by 

researchers along with the SRMR, the 
standardized root mean residual, which is not 
used within this research. Parsimonious 
indices are closer to the indices that are 
absolute fit, however, they are different 
because they account for the complexity of the 
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model. The RMSEA the room of mean square 
error of approximation is used in great 
quantities across the research spectrum as a 
parsimonious fit index and P-CLOSE. 

The third table reveals the limit of what is 
acceptable as a fit along with the fit indices 
for the projected model for research within the 
study at hand. The values were up to the levels 
that were recommended in terms of the 
acceptable fit. In terms of the yl it was seen as 



highly sensitive to any augmentation in terms 
of the size of the sample along with the 
amount of variables that was to be examined 
[75]. This is the reason that the ratio of y2 to 

< df 
its extent of liberty to be calculated ( J ) 

was implemented, through a ratio of 3 or 

smaller, which was an indication that an 

acceptable fit existed in terms of the sample 

data, and the model developed through the 

hypothesis [79]. 



Table - 3 Fit Indices for the Research Model 



Model Fit Indices 


Value 


Recommended 
Guide lines 


References 


«"%> 


1.6 


< 0.3 


Kline and Littel 2010; Hair, 2010 


CFI 


0.9 


> 0.9 


McDonald and Ho, 2002; Hair, 2010 


GFI 


0.9 


> 0.9 


Klem, 2000; McDonald and Ho, 2002; Hair, 2010 


REMSA 


0.04 


<0.05 


McDonald and Ho, 2002 


PCLOSE 


0.81 


>0.5 


Klem, 2000; Hair, 2010 



F. Test of Structural Model 

Overall, eight hypotheses were supported 
by the data. At this point, of the assessment of 
the hypothesized path suggested in the 
structure model Figure 2, in this step of the 
procedure the researcher checks, whether the 
path coefficients are significant and the same 
direction assumes in the model. Also the 
mediators are inspected and evaluated in the 
same way based on the literature which the 
relationship has been constructed. 

Significantly, it is to check the affect of the 
new variables on the model. In general fifteen 
hypothesizes were recognized in the model for 
the study. Table-4 illustrates the standardized 
regression weight of the model hypothesizes in 
Figure 2. 



Table-4 shows appreciation and motivation 
valance affected the PEF negatively, on the 
other hand, PS significantly affects the PEF 
positively. PU found to be predicted by 
appreciation and MV negatively, and 
positively with PS. Overall, PEF and PU have 
a significant direct positive relationship. PEU 
has a positive association with current usage 
which opposite to the association between PU 
and current usage. Current usage with the 
association to the intent of using along with 
the intent of using in link to BI was found has 
no effect or no significant effect. The 
mediators of the model were found to arbitrate 
the correlation. 



Table- 4 Hypothesis Testing Results 



Hypotheses 


Path 


Std Regr. 
weight 


Result 


H 7a 


PEU 


<r 


Ap 


-0.29 


Not supported 


H 7b 


PU 


<r 


Ap 


0.20 


Supported 


H 6 b 


PEU 


<r 


Mv 


-0.78 


Not supported 


H 6a 


PU 


<r 


Mv 


0.36 


Supported 


H 5b 


PEU 


<r 


PS 


1.46 


Supported 


H 5a 


PU 


<r 


PS 


-0.83 


Not supported 
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H 4 


PU 


<r 


PEU 


1.22 


Supported 


H4 a 


Usage 


<r 


PEU 


-1.01 


Not supported 


H 3 


Usage 


<r 


PU 


0.81 


Supported 


H 2 


Att. B 


<r 


Usage 


-0.19 


Not supported 


H2a 


Att. B 


<r 


Training 


0.20 


Supported 


Training 


<r 


Usage 


H2b 


Att. B 


<r 


Work 


-0.02 


Not supported 


Work 


<r 


Usage 


Hj 


BI 


<r 


Att. B 


0.03 


Not supported 


H ia 


BI 


<r 


SN 


0.31 


Supported 


BI 


<r 


Att. B 


Hib 


BI 


<r 


Vol 


0.68 




BI 


<r 


Att. B 


Supported 




Figure -2 CFA Of the Research Model 



V. DISCUSSION 

This research was trying to: (1) forecast 
factors affecting ICT in clearing up the 
behaviour of accepting technology in a 
developing Arab nation, especially Saudi 
Arabia, (2) expand the TAM to expand on how 
ICT is used behaviour wise, along with (3) 
scrutinize the part played by norm, volunteer, 
training time and work type as mediators of 
the ICT acceptance and adoption in terms of 
elaborating on the behaviour linked with ICT 
utilization. For the purpose of addressing the 
objectives of the study the researcher has 
made used of a strategy for research as has 



been outlined in the upcoming seconds. The 
study worked by looking into how applicable 
antecedents of belief variables of (MORC) on 
TAM Technology user believes. Then the 
study examined the affect of current usage on 
TAM model. Finally, the study examined the 
influence of training and work type on the 
relationship between current usage and 
attitude behaviour. Moreover, the study 
explored the affect of norm and volunteer on 
the association between attitude behaviour and 
behaviour intention to use. 

Andam, [80] proclaim that with the 
appreciation in ICT, in particular in public 
organization, it has a positive influence. 



129 



http://sites.google.com/site/ijcsis/ 
ISSN 1947-5500 



(IJCSIS) International Journal of Computer Science and Information Security, 
Vol. 10, No. 2, February 2012 



Unexpectedly, the Table-4 shows the 
relationship between perceive ease of use, and 
appreciation is significantly negatively. This 
due to the motivation method explains how 
individuals' goals influence their efforts. 

Davis [24]and Venkatesh [40] argue that 
appreciation is the increasing value of the use 
of ICT. As well, it is the ways individuals 
appreciate the use of ICT. After looking at the 
principal of appreciation, it is important to 
consider reviewing perceive usefulness. 
Perceive usefulness helps to determine the 
reason why people in an organization accept or 
reject information technology. Eventually, this 
research renders a similar result as Davis [24] 
and Venkatesh [40]claimed, Table-4 shows 
that appreciation has a direct positive relation 
with perceive usefulness. 

The definition of valance is an individual 
strength of performance for a reward. Then 
expectance is the probabilities of a particular 
action leading to a desired reward 
instrumentality indicate individuals estimate 
that performance will result in achieving the 
reward. This means that if an individual has a 
particular goal to achieve, the individual must 
produce certain behaviour in order to achieve 
this goal. Individuals also need to weigh the 
like hood that various behaviours will help in 
achieving the desired goals. If it needs certain 
behaviour as an expectation of more success, 
individuals will, then prefer a new particular 
behaviour [81]. 

The table surprisingly shows direct positive 
relationship between motivation valances and 
perceive usefulness. For the purpose of 
developing better comprehension in terms of 
the issue of valance, the expectancy, describe 
at time it is the performance reward 
proverbially. The expectancy theory gives the 
probability the performance that will lead to a 
desired goal or outcome [82]. 

To some extent, motivation becomes 
valance, instrumentality and expectancy. The 
three factors in the expectancy model can exist 
in an infinite number of contributions 
depending on the range of valance and the 
degree of expectancy and instrumentality. The 
achievement of high positive valance comes 
out, when there is a combination that produces 
a high motivation. When the three values are 



high and produce a high motivation value, the 
valance will then be a positive value [83]. 

On evaluating valance as a way of 
motivating employees, the research found out 
that valance does not take in a specific means 
of motivation. There are various ways of 
motivation valance [84]. This means that 
motivation does not come from the activity 
alone, but from other external factors [85] 
[86]. This study supported the valance 
instrumentality expectance claim of the zero 
effects or negative effect as Table-4 illustrates 
the direct negative relationship between 
motivation valance and perceive ease of use. 

Davis [24] and Venkatesh [40] notice after 
making a careful observation of both principal 
support and perceive ease of use, we can see 
that they are all deriving to the ICT 
implementation. There is an essence of 
improving the ICT systems in order to provide 
an ease to the users. With the ease, the 
organization finds that implementation of ICT 
is enhanced [40]. In fact, in this study the 
principle support has a direct significant 
positive effect on perceive ease of use as 
Table-4 shows. The relationship in this study 
is supported. 

Researches affirm that there is the urge of 
learning the new ICT within the employees; 
yet, there is lack of commitment from the top 
management [87]. At such a situation, the 
management has not acknowledged the 
importance of ICT causing the organization 
member's reluctance as well. With the 
principal of support such, an organization 
requires to establish its weakness towards 
accepting the new information system in order 
to come up with good implementation of ICT 
[87]. 

Yet Table-4 shows that principle support is 
negatively affects the perceive usefulness, 
which confirm what Bjorn and Fathul [1] 
showed in their study that the lack of leaders 
or high officials support contributes to sixty 
per cent of e-government initiative's failure. 

Perceive ease of use determines individual 
behaviour intention to use information 
technology. Perceive ease of use falls under a 
broad topic called TAM. This is a theoretical 
framework that outlines the manner in which 
users can acknowledge a particular 
technology. The theory goes on to state that 
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the presentation of a new technology to the 
users, there are numerous elements that can 
impact their decision making process in terms 
of the time and manner of their usage [88]. 

PEU has shown a direct positive significant 
effect on perceived usefulness in many studies 
[89]. After all, this study came up with the 
same output of the old studies which is 
illustrated in Table-4. The level of significant 
of regression weight is very low. 

Rogers confirmed that perceive ease of use 
is not only the perception of inform a system 
by people but also is the extent where the 
innovation has been seen by the user as easy to 
understand, use and learn [31]. In contrary, the 
study shows Table-4, the association between 
current usage "performances" and perceive 
ease of use, is direct negative relationship. 
This confirms the most recently study done by 
Nagli, Rahmat, Samsudin, Hamid, Ramli, MD 
Zaini and Jusoff [90] which confirm that 
perceive ease of use has no significant in 
today operation Table-4. 

On the other hand, the table shows the 
relationship between perceive usefulness and 
usage is a direct positive relation, with high 
level of significant regression weight. The 
outcome of the study affirms the claim of 
Nagli et al,. [90] that perceive usefulness has a 
direct positive relationship with performance 
and current usage. 

Part of the study is to provide how current 
usage positively affects the attitude to change. 
The increasing and continuous use of ICT 
affects the attitude to change positively. 

Primarily, this study shows Table-4 that 
current performance is negatively affects the 
attitude behaviour with a very low level of 
significant. Due to Change management has 
become critical in the modern world of 
business. All organizations are looking 
forward to manage change in order to achieve 
their objectives. Even if change is important 
for the improvement of the organization 
performance, employees at time have resisted 
change. The reason of change resistance 
among the employees is the fear of losing their 
job positions. Not all organization members 
will take this positively. 

Some organization members may not be 
computer literate and will think the 



introduction of computers will threaten their 
job since they can use the system [91]. Also 
Winters, Chudoba, and Gutek and Teo, Lee 
and Chai indicate that one's attitude has a 
considerable amount of impact in terms of 
being a forecaster of how some technology 
would be used if the user is given the liberty 
to choose if he should or shouldn't use 
computers [92] [93]. 

Management may not fully grasp the actual 
level of expertise required for organizational 
members use the technology effectively. As 
such, they often underestimate the training 
required and the time that it will take in 
implementing the new ICT [36]. 

The organization has to provide training to 
its employees about the use of information 
systems. Besides that, the organization as well 
is required to train the organization members 
about ICT at large. Given the opportunity to 
learn about information systems (IS) and ICT 
at large, the members will develop interest [5]. 

Attitude to change is not always positive. 
To the organization, change is something vital 
but the employee's change is a threat. Many at 
times the organizations may be glad to 
introduce change to their daily services 
activities but organization members will 
always be ready to resist the ICT applications. 
Theory of planned behaviour is significant for 
the understanding of these variables. Theory 
of planned behaviour specifies the natures of 
relation between believe and attitude. 
Individuals' evaluations of attitudes towards 
behaviour usually are determined by 
accessible believes about behaviour [94]. 

A belief is basically one's own concept of 
what will most likely going to happen i.e. a 
certain act or behaviour will lead to a specific 
result. To be specific, the assessment of a 
result is a part and parcel to the behaviour and 
is positively linked to someone's subjective 
likelihood that produces the outcome in 
question [27]. Davis confirmed the relation 
between attitude and behaviour intention as a 
direct positive relationship [24], this study did 
not confirm the hypotheses as the table shows 
positive non significant weight relationship 
with the level of significant is 0.7. 

The stated purpose of TAM is to "provide 
an explanation of the determinants of 
computer acceptance that is general, capable 
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of explaining user behaviour across a broad 
range of end-user computing technologies and 
user populations, while at the same time being 
both parsimonious and theoretically justified" 
[35]. It assumes rationality within the 
decision-making process. Studies have 
provided empirical support for TAM [39]. 

Subjective Norm in relation to an 
innovation was hypothesized to influence 
significantly the user's behavioural intent to 
adopt the innovation. The study shows 
subjective norm mediates the relationship 
between attitude and behaviour intention, even 
though the influence is not significant p level 
is smaller than 0.08 as according to Hair [75]. 

Perceived voluntariness towards an 
innovation was hypothesized to influence 
significantly the user's behavioural intent to 
adopt that innovation. The table shows that 
volunteer mediate the relation between attitude 
and behaviour the direct path has weight 0.81, 
and P-level 0.3, which is significant to 
mediate the relationship according to Hair 
[75]. 

The table shows that training mediate the 
relationship between current usage and 
attitude behaviour the direct path is 0.14 for 
the relationship between training and attitude 
behaviour, and it is negative relation between 
usage and attitude behaviour. According to 
Hair [75] the value of a path must be greater 
than 0.08 in order to be significant. The study 
concludes that Table-4 the nature of work did 
not mediate the relationship between usage 
and attitude with a significant level of 0.9. 



VI. 



IMPLICATION FOR THE THEORY 



This study demonstrated association among 
TAM variables. Yet, It could be observed from 
the research analysis that ICT usage has 
affected the relationship between TAM 
variables. 

Second, the research technology beliefs 
PEOU and PU have an opposite impact on the 
ICT usage. The PEOU has a direct negative 
effect on usage (-1.1). The PU effect on usage 
is -0.8. Therefore, PU is a deciding factor on 
ICT acceptance. What is less clear is if the 
current usage has a motivation effect on 
attitude behaviour due to a low level of usage 
and training. 



Third, this study expands the understanding 
of TAM, and shows that it is very applicable 
to Arab countries (Saudi Arabia). However, 
this argument needs more investigation and 
examination. 

Fourth, the research examined the 
relationship between TAM behavioural 
factors. The analysis revealed a prototype that 
is related to the Saudi public industries. 

Fifth, Lin and Lee [46]mention that 
subjective norm moderate ICT acceptance. 
Also, Quaddus, Xu and Hoque [95] noted that 
that perceived volunteer influence the ICT 
usage. This study confirms this allegation and 
comes to similar result. 

Sixth, this study found that training 
insignificantly increase employee readiness 
for change, this finding support Lan and Cayer 
[58] and Davis and Bostrom [96] argument 
with insignificant influence, due to 80 % of 
the respondents had no training. 

Seventh, this research found that current 
usage has an insignificant negative impact on 
the attitude behaviour, which validate 
Management Hub argument that some of the 
employees are not familiar in using ICT and 
may think that such change may disturb their 
work. 

Finally, work type did not moderate the 
relationship between usage and attitude 
behaviour, this contrasting Al-Adwani [56] 
and Modway, Steers and Porter [57] argument 
that a relationship exists between job nature 
and affective attitude to change. 

VII. IMPLICATION FOR THE 
ORGANIZATION 

The objectives of this investigation are 
knowledge the typologies of employees and 
their preferences of ICT acceptance and 
adoption. And to identify the factors behind 
the ICT acceptance and adoption failure. So 
that the public organizations have upper hand 
over ICT acceptance and adoption failure in 
the future. 

In this part the study will provide some 
understanding of the obstacles to ICT adoption 
in GO's in Saudi Arabia. Seven important 
issues were identified in the survey's out- 
come given as follows: At the first, the most 
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important concern for the public organization 
to deal with is had to do with the question of: 
will the worker be able to use the new system? 
Are they prepared to use it? Have they had the 
suitable training? The employee's willingness 
and the organizations' readiness were 
observed to be near to the ground in 
developing countries. These two issues impact 
significantly the ICT acceptance and soon 
after the adoption of ICT. The research 
instrument showed that approximately 80 % of 
the workers had no training. 

Therefore, employees should be trained on 
the way the system works on the parts that are 
associated with their jobs. In a nutshell, 
achieving a close understanding into the how 
the whole system work is imperative than what 
the user needs as a part. 

Second, the resistance of change issue, this 
matter related to the low e-readiness among 
employees. Research affirms that workers fail 
to use the new ICT system due to the lack of 
training, which cause workers to oppose the 
ICT applications. Consequently, providing a 
better solution for the ICT applications uses 
is a necessity to reduce the resistance. 

Third issue is language barrier associated 
with the new technologies. According to the 
undersecretary of the Ministry of Commerce, 
language is an important barrier to any e- 
systems where the majority of people speak 
Arabic. 

Fourth, skilled workers are required to 
level up the ICT Knowledge. According to the 
survey less educated workers have low ICT 
acceptance. It was also reported by the 
officials that absence of adequate know-how 
hinders governments to adopt and start ICT- 
based projects. 

Fifth, Porter and Lawler [84] propose the 
construction of the job environment keeping in 
mind the objective of internal and external 
rewards to create fulfilment work and could be 
followed by the growth of the job, which will 
make the job more motivating. It is more 
interesting and thus become more satisfying 
fundamentally. Reward in the mean of high 
salary which, consequently provides the 
employees' a motivation to work. 

Sixth, according to the top administrator of 
the Ministry of Civil Status, most of the users 



of e-government were females because of 
cultural issues in which women are expected 
by custom to stay at home. Due to their 
spending long time at home, they will likely 
use the e-services at home frequently for the 
convenient sake. 

Lastly is the issue of leadership support. It 
was reported by the head of the information 
systems department at the Ministry of Finance 
that having leadership support plays an 
essential role in the execution and spreading 
of e-government. According to the official, 
there must be high priority for ICT, and it 
should be considered as the major contributor 
to economy otherwise any important 
development initiatives such as IT education 
will not be paid attention to. Leadership 
support has great influence on the allocation 
of resources for technology and e-government 
adoption. Furthermore, the undersecretary of 
the Ministry of Commerce also argued that 
leadership and top officials' commitment to 
ICT is also crucial. Such has the capability of 
affecting the allocated budget for ICT 
adoption and development in any organisation. 
However, the official stated that budgets 
cannot just be raised to bring about the 
increase in the awareness of ICT but some 
other institutions of government usually 
budgeted for ICT and have their top officials 
willing to work hard with their time and 
energy devoted to ICT. There are other 
organisations with low budget allocation to 
ICT while some such as Ministry of Education 
have a commitment to ICT. 

VIII. LIMITATION AND FUTURE STUDY 

The findings of this study have some 
important implications. One, TAM extended is 
very relevant to a non-western nation where 
there is changing degree of explanatory power. 
However, more studies are still required 
particularly when the explanatory power of the 
model employed is not as high as TAM. 

There is still needs for more investigation 
of additional probable variables, which likely 
give high boast in analysing ICT in terms of 
the behaviour in different nations, apart from 
the west. Two, TAM extended or UTAUT 
model may be employed to analyse other 
behaviour of ICT. Three, the requirement for 
future inspection in terms of the part that 
experience plays when it comes to acceptance 
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modelling for technology, and this was shown 
by the findings. Therefore, there is room for 
future research, particularly with respect to 
training and compensation. Also needed for 
future study is an intensive study of usage as a 
mediating variable. More research designs are 
likely to strengthen the insight into the 
aggregated model. A cross-section of people 
within the ICT in government organization 
usage context was investigated. Therefore, 
studies in the future might examine larger 
subsets of users in relation to pointing out the 
limitations and exceptions. Also recommended 
for future research is the longitudinal studies 
which examine the hypothesised associations 
as they were open for some time now. The 
inclusion of an additional group of antecedents 
which includes being educated about the 
system. 

Lastly, findings suggest that the formation 
of positive attitude of ICT should occur before 
the adoption of technology and as a result, 
researcher should investigate the training 
effectiveness. 
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Abstract — Energy storage for portable low power Embedded 
System is one of the biggest challenges for a long time operation 
in present research and application. These systems are designed 
to operate the lowest possible energy at micro-watt or Milli-Watt 
range and the power is supplied from a small primary or 
secondary cell. In this paper an extensive study and latest survey 
has been shown to estimate and select the right suitable energy 
storage device in theoretical aspects and also commercially 
available cells. The alternative Super-Capacitor based storage 
solution also has been described as ready reference to estimate 
the requirements with trade-offs between cost, size, weight and 
other parameters with feasibility aspects. 

Keywords — Low Power, Mathematical Modeling, Energy 
storage, Capacity, Form factor, Cost. 

I. Introduction 

The embedded system device requires a tiny storage device 
with high specific capacity and long shelf life. The cost is an 
important parameter and the product should be readily 
available in the market, so that it can be replaced easily when 
required. Normally batteries are primarily considered for 
storing energy. In this paper different chemical compositions 
of primary and secondary batteries have been studied for 
different characteristics that make them suitable to be used for 
an embedded device. The parameters which form the basis of 
mathematical model are: specific capacity, specific weight and 
volume, nominal voltage, shelf life, re-chargeability and cost. 
This paper also highlights super capacitors that are emerging 
as energy storage system. Since, batteries have a limitation 
that they take a long time to recharge. Super capacitors, with 
engineering characteristics like very less charge time, good 
temperature range, capacitance in 1000F range, can become 
an alternative to the batteries. 

II. Chemical Battery & characteristics 
An All the parameters in a storage system are compared 
with "capacity" of the storage system. There are the lists of 
comparison study of all major parameters for conventional 
battery/cell energy system. 
A. Specific Cost 

The prime interest of any storage system is the cost or the 
specific cost ($/Ah). Fig. 1 shows a comparison report based 



on commercially available batteries/cells. The result shows, 
the Zinc-air has the maximum cost per unit amount of energy 
where as the ZnMn0 2 has the lowest cost per unit amount of 
energy. Hence, if the desired parameters are cost-effective 
device, the priority will be batteries based on ZnMn0 2 . 
However, ZnMn0 2 is the primary cell. If we need to select 
Secondary battery, the best selection is Li (CF) n . 

Specific Cost Comparitive Study 
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Fig. 1 Specific cost of various chemical compositions of commercially 
available batteries 

B. Specific Capacity 

The specific capacity i.e. capacity per unit mass (Ah/Kg) is 
another important parameters based on which batteries are 
selected. In Fig. 2 the comparison report is highlighting this 
parameter for commercially available batteries/cells. As per 
the graph, the Zinc-air has the maximum capacity per unit 
mass of cell where as the lead acid the lowest capacity per unit 
mass of cell. Hence, if the desired parameter is high specific 
capacity device, the priority will be batteries based on Zinc-air. 
Since Zinc-air is the primary cell, so if we need to select 
Secondary battery then Li (CF) n would be the best. 
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Fig. 2 Specific capacity of various chemical compositions of commercially Fig. 4 Specific volume of various chemical compositions of commercially 
available batteries available batteries 



C. Specific Weight 

For portable devices, weight of battery plays a significant 
role. Here is a comparison report for specific weight i.e. 
weight per unit capacity (Kg/ Ah). As appear from the graph, 
the lead acid battery has the maximum weight per unit amount 
of energy where as the Zinc-air has the lowest weight per unit 
amount of energy of the cell. So, if weight is the prime 
selection parameter then, the Zinc-air battery tops the list. 
Among secondary batteries Li(CF) n based batteries would be 
the best choice. 
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III. Super Capacitor an Alternative to Battery 

The capacitance is measured in units called Farads of 
which the definition is: 1 Farad is the capacitance of a 
conductor, which has potential difference of 1 Volt when it 
carries a charge of 1 Coulomb. So, 

Q = CV (1) 

Where Q is the charge in coulombs, C is the capacitance in 
Farads and V is the voltage. 



Also "Q" is also represented as: 



Q = Ixt 



Where Q is the charge in Coulombs, I is the current in Amps 
and t is the time in Seconds. 

The battery capacity comes in Ampere-Hour unit. 

lmAh = 3.6Q (2) 

Using eq n (l) and eq n (2) the capacity offered by the super 
capacitors can be calculated. Also the charging time can also 
be calculated, assuming that capacitor is fully charge when it 
has acquired 90% of its operating voltage. 



Fig. 3 Specific weight of various chemical compositions of commercially 
available batteries 

D. Specific Volume 

Like weight, the unit volume is again a significant 
parameter for portable devices. The report compares different 
cells/batteries based on cell volume per unit amount of energy 
(cm 3 /Ah). According to the report, the lead acid battery has 
the maximum volume per unit amount of energy where as the 
Zinc-air has the lowest volume per unit amount of energy. So, 
if volume is the major parameter for selection, then the Zinc- 
air battery should be selected. For secondary batteries Li(CF) n 
based batteries are best. 



A. Capacity vs. Form Factor 

The form factor i.e. the volume and weight of the energy 
storage system is the major parameter based on which 
comparison has been done. In fig. 5 the report displays the 
form factor based comparison for commercially available 
battery and capacitor. The result shows, that capacitors are 
bulky as compared to batteries offering similar capacity. So if 
form factor is the prime parameter then batteries are a better 
choice than capacitors. 
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Fig. 5 Capacity vs. form factor of various chemical compositions of 
commercially available batteries and capacitors 

B. Capacity vs. charging time 

The time taken to charge an energy storage device is again 
an important parameter. In fig. 6 the report shows the 
graphical comparison for the same. According to graph the 
time taken by a capacitor to get charged is vey less than a 
battery. So the application where charging time is the deciding 
factor, capacitors should be given preference to batteries. 

Capacity vs Charging Time 



-Log. 
(Capacitor) 

-Log. 
(Battery) 



Capacity (niAh) 

Fig. 6 Capacity vs. charging time of various chemical compositions of 
commercially available batteries and capacitors 

C. Capacity vs. cost 

Cost is also a significant parameter for an energy storage 
device. The report compares both the energy storage systems 
based on cost factor. And as it's clear from the fig. 7 the 
capacitors are costlier than batteries. Thus, for low cost 
solutions batteries are more suitable than capacitors. 
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Fig. 7 Capacity vs. cost of various chemical compositions of commercially 
available batteries and capacitors 

IV. Conclusions 

The comparative result is the formulated survey based data 
collected recent industry wide and it contains almost all major 
parameter to select a suitable storage unit following the most 
important parameters with its individual weight. The battery 
selection is totally arbitrary and fully depends on user and his 
choice. For example when Specific Capacity is the prime 
factor, Zn-Air is the most suitable battery where as, for 
costing point, ZnMn0 2 is the most suitable and for individual 
cell voltage, LiSOCl 2 is the most suitable selection. In some 
exceptional case, the appropriate solution may differ from 
study result: such as, Zinc-air battery which is very high at 
specific capacity point but may not be available at very high 
capacity size, on that situation, user must look forward to 
commercial availability of that choice or can shift to next 
priority as per the graphs shown. 
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ABSTRACT: The health care system in Nigeria is nothing to 
write home about, this is due to the inadequate availability of 
medical and human resources. This inadequacy has led to 
enormous paper work, waste of time, life, and ineffective 
treatment procedures. 

This study takes a look at the use of mobile technology devices 
such Personal Digital Assistants, cell phones, personal laptops, 
palm top etc for health care delivery in Nigeria. It proposes to 
look at how healthcare application will improve the healthcare 
services by providing spatially the link between patient and 
primary health care .This link will provide easy access to 
medical information at the points of health-care delivery 
within health care centers. An E- Health model can be designed 
to solve this problem, the central idea behind this model is to 
use web languages to describe a user's personal health 
environment and extend the web browser to use this document 
to support user mobility. This model enables users to access 
user-defined health environment from anywhere, at anytime 
and with any kind of Internet-connected computers. With the 
use Unified Modeling Language, mobile, and Java-based 
technologies for its development, E-Health is proposed to be a 
general solution to the current challenges in the health sector 
in Nigeria. 



1. INTRODUCTION 



It is often said that health is wealth; a nation with a 
good health care system is therefore blessed. Information 
and Communication Technology (ICT) has played a major 
role in the understanding of illness, its successful diagnosis 
and in the practice of medicine itself. The involvement of 
ICTs in health is commonly called e-health, it is essentially 
the use of ICTs in medicine for knowledge management and 
service delivery, a combination which can essentially 
improve the delivery of medical services and can by 
consequence improve health outcomes [9]. 
The World Health Organization [10] determines that there 
are five essential 



components to e-health: structural enhancement in the 
delivery of health services, engagement with stakeholders 
and the private sector in improving the availability and 
appropriateness of technologies, learning how to use the 
tools, creation of standardized norms and practices, and 
evaluation and monitoring of the application and impact of 
ICTs to health. E-health is therefore a model that functions 
as a network of experts and resources that are able to be 
mobilized within a wide range of distance. E-Health 
enhances the timeliness of emergency response, thus 
reducing mortality rate and hospital cost. Surveillance and 
information gathering systems that allow the recording and 
analysis of data on spread able diseases is an example of 
ICTs in this field and is essential in managing the safety of 
populations [3]. 

Nigeria is one of the developing countries of the world, 
located in West Africa. Although it is blessed with natural 
resources, its wealth is being misspent by its leaders and not 
spent on important sectors like health. 

Sixty percentage of the health care delivery system in the 
country is delivered by the government and the remaining 
forty percent by private parastatals. The government owned 
health care system is divided into local government, state 
government and the federal government health care systems 
as shown in the figure below. The National Health 
Information System (NHIS) was proposed by Nigeria 
Ministry of Health, to perform the major reforms 
implemented under the Nigerian government health scheme. 



Figure 1. [Health Care Structure in Nigeria.] 
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Nigeria has been unable to enjoy the benefits of E-health 
because we lack the capacity to systematically evaluate 
developments in ICT, we lack standardized street addresses, 
and we are unable to create digital records and databases for 
tracking the spread of disease and lack access to online 
resources on treatment and diagnosis of illness, management 
tools and support for clinical care. 

Mobile Computing is the use of portable devices to exploit 
the increasing processing power of consumers needs. E- 
Health (the use of mobile computing in the health sector) 
provides a perspective for exploring the relationship 
between people, their health and well-being through the use 
of mobile phones. It helps in improving the health- care 
services by providing the basic infrastructures needed in the 
sector. E-Health aims to improve healthcare services in 
Nigeria by developing a Secure Health Information Platform 
using latest Information Technologies. The spread of hand- 
held devices, new generation of programmable cellular 
phones and the availability of development environments 
for such devices have made possible the design and 
development of new kind of software that satisfies the users' 
needs about mobility support and personal information 
management. With the advancement of network technology 
these past years, the ability to connect different networks 
across different platforms has become a complex task. There 
have been many proposed solutions, some of it are J2SE™ 
and J2EE™, the J2ME™ platform which are Sun 
Microsystems technologies that provides a solution to 
bridge the communication gap between a PC and a mobile 
phone, technologies that directly transfers information 
without need for format conversion; J2ME is a technology 
for mobile devices . 

An E-Health system is typically built of several technologies 
with all their different possibilities and limitations. There 
are many different technologies available and new 
technologies arrive continuously. J2ME™ is a new 
technology that has recently been introduced at the wireless 
market and many believe that J2ME™ will improve the 
diffusion of E-Health. 

As mentioned above, E-health is one of the fundamental 
issues for E-commerce why this paper seeks to answers the 
following question: 

"To what extent is J2ME™ suitable for the client technology 
in a successful E-Health system?" 

To answer that question, the paper will initially describe the 
market, technologies and fundamental aspects of E-Health. 
Furthermore the paper will describe a set of critical success 
factors (CSFs) and example of requirements. The CSFs shall 
outline the requirements from the market and describe the 
challenges of providing a successful mobile payment 
system. 

On that basis the paper will analyze the J2ME™ technology 
as a building block of a E-Health system. Comparisons to 
existing systems and technologies are made during the 



analysis and the strengths and weaknesses of J2ME™ are 

described. 

The expected conclusion will be an evaluation of the 

J2ME™ as a building block of E-Health, by describing a set 

of statements of what eventually needs to be done, to 

introduce successful E-Health systems with J2ME™. 

2. Statement of the Problem 

The main problem with the health care system in Nigeria is 
that most hospitals are still using the manual system to 
operate, this leads to enormous paper work and 
consequently, patients have to wait in long queues, while 
waiting some patients' health deteriorates and worse still, 
some die in the process. Furthermore, there are inadequate 
human and medical resources. Most hospitals have just two 
or at most three doctors attending to hundreds of patients 
and the hospitals are not properly equipped with necessary 
infrastructures. Besides, the doctors and nurses are not 
properly trained; they do not have the necessary information 
and knowledge to help improve the conditions of their 
patients. E-Health proposes to design a Healthcare 
application that will improve the healthcare services in 
Nigeria by providing a link between patient and primary 
health care. This link will provide easy access to medical 
information at the points of health-care delivery within 
health care centers. E-Health aims to improve healthcare 
services in Nigeria by developing a Secure Health 
Information Platform using latest Information Technologies. 



3. REVIEW OF LITERATURE 

Mobile computing has been used to solve a wide range of 
problems in fields like Banking, Geology, Law etc. In the 
medical field, many designs have been proposed and 
implemented. In turkey, a geographical information system 
was designed to improve the health care facilities of the 
country. GIS was used to map and analyze the geographical 
distributions of populations at risk, health outcomes, and 
risk factors in turkey. Research has shown that there is a 
Heartbeat programme in Jordan that allows medical experts 
to give advice to doctors in remote areas where no specialist 
is present, thus the patient doesn't have to think of looking 
for funds to travel and this has led to the reduction of 
unnecessary visits to the few heart specialist of the country 
and has helped saved millions of dollars for the health 
system in the country. 

In addition, the Atraumatic Restorative Treatment (ART) 
approach was introduced in government dental clinics in 
Tanzania to help improve the health conditions in that field. 
In Nigeria there is an on going research work aimed at 
presenting an integration of a fuzzy expert system and fuzzy 
association rule mining process to enhance the 
comprehensibility of an expert system in medical domain. 
"Telehealth, electronic health records, computer-assisted 
prescription systems, accessing clinical databases and other 
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aspects of e-health are transforming health today and hold 
even greater promise for the future" [7]. As the technologies 
continue to evolve and their use becomes more ubiquitous, 
the field of medicine will see further increases in its 
capabilities through enhanced "remote consultation, 
coordination, and diagnosis" [3]. 

This study, E-Health (the use of mobile computing in the 
health sector) proposes to solve some of the problems we 
are having in the health sector in Nigeria. The design and 
implementation a health care software technology will make 
health care in Nigeria very effective and efficient by helping 
to reduce time wasting while waiting to see the doctor and 
by providing the required information needed to solve the 
patients' problems. It will help increase the resources in the 
health sector and provide easy access to medical treatment. 



4. METHODOLOGY 

Software Platforms 

The software running on mobile devices is similar to the 
better known software running on desktop computer. The 
software is built in a hierarchical fashion with operating 
systems, runtime systems and application. The SIM card 
plays a vital role in the software-hierarchy - design of 
handheld handsets. Mobile handsets can have two different 
types of operating systems (OS): Standardized OS or 
proprietary OS (called black boxes). Originally, all mobile 
handset's functionality was defined as a black box but as 
mobile handsets became more and more sophisticated, the 
black box opened up and became more similar to operating 
systems, allowing applications outside the black box to 
utilize the handset resources. 
Examples of OS' for handset are Microsoft Mobile, Palm 

05, Symbian etc. The common tasks of both types of 
operating systems are: process management, file system, 
graphical user interfaces, device drivers, networking and 
security. 

JAVA 

The main technology usually used in E-Health is Java 

technology; it will be usually used as a building block for E- 

Health system. Java platform offers a runtime and 

programming environment to develop applications, and an 

increasing number of handsets are supporting the Java 

technology. The Java 2 platform consists of three elements: 

Java Programming language 

Java Virtual Machine (JVM) 

Application Programming Interfaces (APIs) 

A program written in Java goes through two steps in order to 

run on a hardware platform. First, the program has to be 

compiled into byte code and this is done by a Java compiler. 

Then in order to run, an interpreter in the Java virtual 

machine has to interpret the byte code into appropriate 

machine code. By having Java virtual machines for different 

hardware platforms, the Java programs do not have to 

consider the hardware platform it will run on, this will be 



taken care of by the Java virtual machine. This is the idea 
behind Java vision of "write once, run anywhere". The APIs 
are codes that is written and ready for reuse through a well 
define interface (graphical user interface). 

The E-Health application is usually implemented in 
Java, more specifically J2SE, J2ME, and J2EE to ensure 
portability. The Java Standard Edition (J2SE) contains the 
most commonly used set of classes without any added 
functionality. Mobile devices usually use a lighter version of 
Java called the Micro Edition (ME), this ensures full support 
of user mobility and provides a runtime environment for the 
Java MIDlet. Java Enterprise Edition (J2EE) is the 
edition/technology of Java used to ensure full support of 
user mobility via the web. 

Java is usually used because it is user friendly 
(through the use of graphical user interface), it has a secure 
way of handling data (strong encryption, authentication and 
verification) and its universality. 



ANALYSIS 

The use cases are formal methodology means to show how 
the functionality the system offers meets some needs of the 
user. They are not meant to indicate how the communication 
between participants of the system is, but rather a tool to 
identify the functionality the different actors have to offer. 
There should be at least two actors in this thesis, a user (the 
client) and the application server. The user represents a user 
with an E-Health application enabled mobile phone or PC. 
The user can interact with the server who offers the required 
services. 



user(patient/doctor) 




server 



figure 2. [High level use cases]. 



4. MATERIALS AND METHODS 



Project Description 
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This paper provides users of J2ME, J2SE, J2EE enable 
applications the ability to seek medical attention via the 
mobile phone and the internet. 



Description of Components 

The E-Health has three key components with which to build 

upon and these include the following: 

The mobile phone: this facet serves as the client; it is the 
medium in which transactions are been perform. J2ME 
(Java Micro Edition) is the edition / technology of Java that 
is used in this facet to ensure full support of user mobility. 
Instead of a patient spending hours in a particular clinic 
waiting to be attended to, the patient simply uses his/her 
phone (in which the E-Health application has been enabled) 
and simply collects the necessary information he/she 
requires for the particular illness and goes to the chemist for 
drugs or books an appointment with the best doctors if need 
be. 

The Central Database System: this is the second facet of this 
application and it acts as a crosslink for the other two facets 
of this application. The Database is necessary in this 
application since we will be dealing with records of various 
medical practitioners, hospitals, patients etc. and database 
serves as an efficient and effective way to keep and store 
records. We have various database systems in use today; we 
have Oracle, MySQL, MSSQL (which is windows based), 
JDB and so on. Oracle is the most compatible with Java but 
it is meant for companies running large scale on line 
enterprise application and requires not less than 5g of RAM. 
The Web Application Server: this facet serves as the server. 
When a patient sends a request to a particular clinic, the 
client (the phone sends the request to the Web application 
server). The Web application automatically checks the 
database to confirm the authenticity of the patient 's records 
in the clinic and automatically sends a respond either 
positive or negative to the client (phone). J2EE (Java 
Enterprise Edition) is the edition/technology of Java that is 
used in this facet to ensure full support of user mobility via 
the web. 

Netbeans IDE 6.0 is the platform in which the three facets of 

this application are interlinked and run. 

The following steps are involved in a typical E-Health 

system: 

An order is placed and sent to a particular clinic. 

The Mobile Digital E- Health application is sent to the 

mobile phone. 

The user receives the application and installs it on his/her 

phone. 

The enabled phone is then used via the application installed 

on it to help solve health related problems. 

Procurement and Deployment of Related Software 



All of the needed software for development is easily 
obtainable throughout the Internet. There are mainly three 
items to download and install unto the system. Of course, it 
is understood that the system of development is a Windows 
XP system running on Intel Centrino technology 

J2SE J2ME 

The base developmental platform, J2SE can be obtained 
from Sun's Java website, http://java.sun.com. The additional 
APIs for J2ME can also be downloaded from the same 
website. Programmers knowledgeable of Java can easily 
install these two development kits. It is recommended that 
all these are installed in a directory with no spaces. 
Development of the thesis used Java 2 SE 1.6 and Java 
Wireless Toolkit 2.5.2. Development in the toolkit is unique, 
as one can only create folders within the \WTK22\apps 
directory, as j2me requires specific folders and fdes that are 
automatically generated by the toolkit. (Ktoolbar 
application) 

Netbeans 

The Netbeans IDE 6.0 which is the platform in which the 
three facets of this application are interlinked and run. 
Netbeans IDE 6.0 can be easily downloaded in the internet. 

MySQL 

The Database is necessary in this application since we will 
be dealing with records of various custom, transactions, 
accounts etc. MySQL is used in this thesis as the database 
system because it economizes resources and easy to use. 

5. RESULTS AND DISCUSSION 

User mobility is another degree of mobility that is less 
pursued in the area of E-Health. This paper proposed a new 
computing model that tries to provide user mobility service 
to all applications through a system-level solution. This 
solution utilizes a platform-independent interface to fit a 
user's personal computing environment on a platform- 
dependent middleware that provides two services to 
web-top applications. 



6. CONCLUSION AND RECOMMENDATIONS 

A system to support user mobility faces several challenges. 
First, a user's personal computing environment must fit in a 
heterogeneous environment. Second, local resources on an 
E- Health System client must be integrated into the 
user's computing environment to best support user. 
There are also security issues and adaptation issues. 
However, these two issues are not addressed in this thesis. 
E-Health System model opens new opportunities for future 
work. The E-Health System, including a prototype could be 
developed in many aspects even within the definition in the 
thesis 
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Abstract — Many nations all over the world have increased their 
dependency on cyberspace by maximizing the use of Information 
and Communication Technology (ICT). In this digital age, the 
concept of cyber terrorism or the use of cyberspace to carry out 
terrorist activities has emerged. Interestingly, there are many 
concepts of cyber terrorism provided by researchers, policy 
makers and individuals. This paper proposes a framework 
describing the core components of cyber terrorism. The authors 
have analyzed the data by using a grounded theory approach, in 
which the framework is drawn. The framework defines cyber 
terrorism from six perspectives: Target, motivation, method of 
attack, domain, action by perpetrator, and impact. In addition, 
the proposed framework provides a dynamic way in defining 
cyber terrorism as well as describing its influential 
considerations. Continued research in this area can be further 
conducted, which may lead to the development of strategic and 
technological framework to counter cyber terrorism. 



Keywords-component; 
Terrorism 



I. 



Cyber Terrorism, Cyberspace, ICT, 



Introduction 



Cyberspace and the Internet are at the center of modern life 
and have become an important medium for businesses, 
economics, politics and communities. Many nations all over 
the world have constantly increased their dependency on 
cyberspace by maximizing the use of Information and 
Communication Technology (ICT). ICT offers a double-edged 
sword. While development in the area of ICT allows for 
enormous gains in efficiency and productivity, it has also 
created opportunities for those with devious ambitions to cause 
harm [1]. At the same time, it can be a powerful tool for 
perpetrators such as extremists and terrorist groups to promote 
extremist ideologies and propaganda materials as well as to 
create public fear by damaging assets that are vital to national 
interest and security [2] [3]. The same technological advances 
that are benefiting the public at large are also increasing the 
arsenal of our adversaries. 

Critical National Information Infrastructure (CNII) 
underlies the nation's economic, political, strategic and socio- 
economic activities [4]. Many stakeholders are concerned with 
terrorist attacks against critical infrastructures such as 
telecommunications, power distributions, transportation, 
financial services and essential public utility services. Terrorist 
cyber attacks on CNII is possible, where the motives, resources 



and willingness to conduct operations of different kinds against 
specific targets are fundamental [5]. If perpetrators follow the 
lead of hackers, theoretically they have the capability to use 
ICT to conduct cyber attacks against specific targets. Due to 
the fact that cyberspace has no boundaries, there is a possibility 
that the terrorists or terrorist groups may pursue cyber terrorism 
in conducting offensive attacks and supporting physical 
violence in the future [6]. 



II. 



Concepts and Terms 



A. Cyber Terrorism 

War, crime and terrorism are traditional concepts that occur 
in the physical domain, the only new aspect is the "cyber" 
domain. Physical terrorism and cyber terrorism share the same 
basic elements i.e. sharing a common denominator - terrorism. 
Several researchers have argued that the underlying principles 
of terrorism behind the threat remain the same [6], and they 
have described terrorism activities in the cyber world as cyber 
terrorism [7]. 

It is noted that several definitions of terrorism have 
included targets directed at computer systems and its services 
that control a nation's energy facilities, water distributions, 
communication systems, and other critical infrastructures. 
Malaysia's Penal Code, Chapter VIA, Sections 130B - 130T 
comprises provisions dealing with terrorism [8]. Section DOB 
(2) (h) defines terrorism as an act or threat of action designed 
or intended to disrupt or seriously interfere with, any computer 
system or the provision of any services directly related to 
communications infrastructure, banking or financial services, 
utilities, transportation or other essential infrastructure. 
Australia's Security Legislation Amendment (Terrorism) Act 
2002 defines terrorism, among others, as actions that seriously 
interfere, disrupt, or destroy, an electronic system including, 
but not limited to, an information system; a 
telecommunications system; a financial system; a system used 
for the delivery of essential government services; a system used 
for, or by, an essential public utility; or a system used for, or 
by, a transport system" [9]. 

The term cyber terrorism was first coined in the 1980s by 
Barry Collin [10], a senior research fellow at the Institute for 
Security and Intelligence in California. According to him, the 
convergence of the "virtual world" and "physical world" form 
the vehicle of cyber terrorism. Collin further clarifies that the 
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virtual world is the place in which computer programs function 
and data moves whereas the physical world is the place in 
which we live and function. The growing convergence of the 
physical and virtual worlds is becoming more complex. 
Nowadays, ICT plays a major role in the convergence of these 
two worlds. 



Denning [11] defines cyber terrorism as unlawful attacks 
and threats of attack against computers, networks and the 
information stored therein when done to intimidate or coerce a 
government or its people in furtherance of political or social 
objectives. Denning also clarifies that, "Further, to qualify as 
cyber terrorism, an attack should result in violence against 
persons or property, or at least cause enough harm to generate 
fear. Attacks that lead to death or bodily injury, explosions, 
plane crashes, water contamination, or severe economic loss 
would be examples. Serious attacks against critical 
infrastructures could be acts of cyber terrorism, depending on 
their impact. Attacks that disrupt non-essential services, or that 
are mainly a costly nuisance, would not." Definition by 
Denning consists of several important components on the 
concept of cyber terrorism. First, it refers to unlawful attacks. 
Second, the attacks and threats of attacks against computers, 
networks and the information stored within them. Third, the 
purpose of (unlawful attacks) is intimidating or influencing a 
government or society to further political or social objectives. 
Fourth, the attack results in violence against persons or 
property, or at least causes enough harm to generate fear. 
Lastly, serious attacks against critical infrastructures could be 
acts of cyber terrorism. 

Likewise, Lewis [12] defines cyber terrorism as the use of 
computer network tools to shut down critical national 
infrastructures (such as energy, transportation, government 
operations) or to coerce or intimidate a government or civilian 
population. Mantel [13] defines cyber terrorism as highly 
damaging computer attacks by private individuals designed to 
generate terror and fear to achieve political or social goals. 
Mshvidobadze [14] defines cyber terrorism as cyber acts 
designed to foment terror or demoralization among a target 
population for some purpose of the perpetrator, most likely this 
will be some kind of attack on critical infrastructure. Cyber 
terrorism should be involving computer technology and means 
as a weapon or target by terrorist groups or agents [15]. In the 
context of cyber terrorism, the above definitions suggest that 
critical infrastructure's computer system and civilian population 
would seem become attractive targets and contribute to the 
uniqueness of cyber terrorism. Here, the direct damage caused 
by the attack is to the critical infrastructure's computer system 
and civilian population. 

The context of cyber terrorism seems to argue that this term 
comprises component of motivation such as political, social 
and belief. For example, Conway [16] describes that, in order 
to be labeled as cyber terrorism, the attacks must have a 
terrorist component, which is result in death and/or large scale 
destruction and politically motivated. Pollitt [17] defines cyber 
terrorism as the premeditated, politically motivated attack 
against information, computer systems, computer programs, 
and data which result in violence against non-combatants target 
by sub national groups or clandestine agents. Czerpak [18] 
argues that cyber terrorism is a politically driven attack 
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perpetrated by the use of computers and telecommunications 
capabilities, which leads to death, bodily injury, explosions and 
severe economic loss. Nagpal [19] defines cyber terrorism as 
the premeditated use of disruptive activities, or the threat 
thereof, in cyber space, with the intention to further social, 
ideological, religious, political or similar objectives, or to 
intimidate any person in furtherance of such objectives. 



Method of attack in cyber terrorism seems to use computer 
technology in carrying out the acts of terrorism. Beggs [20] 
defines cyber terrorism as the use of ICT to attack and control 
critical information systems with the intent to cause harm and 
spread fear to people, or at least with the anticipation of 
changing domestic, national, or international events. Similarly, 
Weimann [21] defines cyber terrorism as the use of computer 
network tools to harm or shut down critical national 
infrastructures (such as energy, transportation and government 
operations). CRS Report for Congress [22] defines cyber 
terrorism as the use of computer or weapons, or as targets, by 
politically motivated international, or sub-national groups, or 
clandestine agents who threaten or cause violence and fear in 
order to influence and audience, or cause a government to 
change its policies. 

As defined by Denning, the action by perpetrator involves 
to unlawful attacks to the targeted audiences. This notion is 
supported by Ariely [23] where cyber terrorism is referred as 
the intentional use or threat of use, without legally recognized 
authority, of violence, disruption, or interference against cyber 
systems. The result would be in death or injury of a person or 
persons, substantially damage to physical property, civil 
disorder or significant economic harm. This understanding is in 
line with study conducted by Nelson et al. [24] which defined 
cyber terrorism as the unlawful destruction or disruption of 
digital property to intimidate or coerce governments or 
societies in the pursuit of goals that are political, religious or 
ideological. 

Cyber terrorism can have critical impact to the targeted 
audiences such as to cause fear to anyone in the vicinity or 
result in violence, death and destruction. Stohl [25] argues that 
cyber terrorism includes some form of intimidate, coerce, 
influence as well as violence. He defines cyber terrorism as the 
purposeful act or the threat of the act of violence to create fear 
and/or compliant behavior in a victim and/or audience of the 
act or threat. In a report to the United Nation General Assembly 
First Committee on Disarmament and International Security, 
cyber terrorism is mentioned as actions conducted via 
computer network that may cause violence against or generate 
fear among people, or lead to serious destruction for political or 
social problem [26]. Ron Dick, Director of the US's National 
Infrastructure Protection Center (NIPC) defines cyber terrorism 
a criminal act perpetrated through computers resulting in 
violence, death and/or destruction, and creating terror for the 
purpose of coercing a government to change its policies (as 
cited in [27]). This definition perhaps is taken from the US 
Government's definition of terrorism with the inclusion of 
"computer" in the definition. 

Kerr [28] believes that cyber terrorism should have three 
common elements: The use of violence, political objectives, 
and the purpose of showing fear within a target population. 
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Ellsmore [29] says that cyber terrorism can be differentiated in 
terms of intent, outcome and the use of skills. Further analysis 
suggests that there are at least five elements which must be 
satisfied to construe cyber terrorism as described in Table I 
[30]. 



Table I: Elements of Cyber Terrorism (adapted from Yunos et al. [30]) 



Elements of 

Cyber 
Terrorism 



Politically-motivated cyber attacks that lead 
to death or bodily injury; 
Cyber attacks that cause fear and/or 
physical harm through cyber attack 
techniques; 

Serious attacks against critical information 
infrastructures such as financial, energy, 
transportation and government operations; 
Attacks that disrupt non-essential services 
are not considered cyber terrorism; and 
Attacks that are not primarily focused on 
monetary gain. 
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activity [36]. Malaysia too has enacted the Computer Crimes 
Act 1997. The purpose of the Act is to provide offenses relating 
to the misuse of computers. Amongst other things, it also deals 
with unauthorized access to computer material, unauthorized 
access with intent to commit other offenses and unauthorized 
modification of computer contents [38]. From legal 
perspective, the definition of Malaysia's computer crimes in 
Computer Crimes Act 1997 and terrorism in Penal Code, 
Chapter VII A, Section DOB is different. These two concepts 
cover different areas. In the simplest terms, cyber terrorists' 
actions may cause prejudice to national security and public 
safety whereas cyber criminals' actions may cause prejudice to 
individuals or groups for the purpose of monetary gain. 



Based on the discussion above, there is no common 
agreement on the concept of cyber terrorism at the international 
front and among the researchers. While there are many 
definitions of cyber terrorism, these suggest a trend that further 
analysis of the phenomena could be further conducted. This is 
evidence as the study of this concept has been the focus of 
many policy makers and scholarly studies, but their standpoints 
and views vary. Due to multidimensional structures (or 
components) of cyber terrorism, we can say that the concept of 
cyber terrorism is a contested concept who interpret it 
differently by a number of parties. The context of cyber 
terrorism denotes different understandings and interpretations. 

B. A Clear Line between Terms 

When discussing cyber terrorism, there is always confusion 
between the term cyber terrorism with "cyber crimes" and 
"terrorist use of the Internet" [31]. However, these terms 
should not be mistaken as synonyms for cyber terrorism. 

Cyber terrorism has become a buzzword and is often 
sensationalized in the media whereby reports of cyber crimes 
are posed as cyber terrorism [31]. Berner [32] argues terms 
such as "computer crime" or "economic espionage" must not 
be associated with the term cyber terrorism. In defining cyber 
terrorist and cyber crime activities, it is necessary to segment 
the motivation and action [33]. From the motivation 
perspective, cyber terrorism is clearly different, operating with 
a specific agenda to support their actions [34]. Cyber crime and 
cyber terrorism can be differentiated through financial or 
economic purposes [35] [36]. 

The United Nations categorized cyber crime as 
unauthorized access, damage to computer data or programs, 
sabotage to hinder the functioning of computer system or 
network, unauthorized interception of data to, from and within 
a system or network; and computer espionage [37]. From a 
legal perspective, cyber crimes and cyber terrorism are two 
different things. In the United States, The Computer Fraud and 
Abuse Act (18 USC: 1030) defines cyber crimes as 
unauthorized computer intrusions or misuse as unlawful 



Many studies have indicated that the Web 2.0 media such 
as interactive websites and blogs, social networking sites and 
discussion forums have been rapidly used by extremists as the 
medium to support their online activities [13]. However, it is 
important to note that cyber terrorism is different from 
terrorists' use of the Internet [31]. Taliharm [33] argues that 
cyber terrorism should not be confused with the use of illicit 
activities or Internet radicalization in cyberspace by the 
terrorist groups [33]. Taliharm [33] further argues that 
terrorists' use of the Internet is just action by certain individual 
or group to organize illicit activities by using the cyberspace. 

Radicalization and extremism in cyberspace, however, can 
lead to terrorism [39]. Understanding online radicalization is 
one of the pillars of the fight against terrorism [21]. Perhaps the 
main concern is the potential for terrorists to use the Internet to 
inflict damage. The United Nations' report mentioned that the 
concern is to prevent moderates from becoming extremists, and 
extremists from becoming terrorists [40]. Threats from 
terrorism must be analyzed before they evolve into fully- 
fledged threats. Many of the actors in foiled plots have been 
discovered to have been radicalized online, on terrorists' and 
extremists' websites and chat rooms, amongst others, to 
provide information on weapons and explosives and facilitate 
large-scale recruitment efforts and propaganda [3]. 

C. Empirical Cyber Terrorism Frameworks 

Based on literatures, there are several empirical frameworks 
on cyber terrorism proposed by researchers. Veerasamy 
proposed a conceptual framework outlining the aspect of cyber 
terrorism that addresses the operating forces, the techniques 
and the objectives [41]. The operating forces provide the 
context in which cyber terrorism is functioning, in which it 
describes the qualities of a cyber terrorist as well as the 
properties of cyber terrorism in general. The technique 
describes practical methods and classification descriptions of 
carrying out cyber terrorism via invasive or offensive computer 
and network security practices. The objectives are similar to the 
motivation, where the intent is to cause direct damage via 
malicious goals and support functions. The framework 
provides a high level overview and serves as a basis of 
considerations in the domain of cyber terrorism. However, the 
framework's attributes are not interactive and quite complex. 
The framework signifies that in order to consider cyber 
terrorism, at least one or more elements must be fulfilled. 
However, this is not accurate as cyber terrorism should be seen 
from a holistic perspective. 
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Another framework on cyber terrorism, proposed by 
Heickero, illustrates the effects and consequences of cyber 
terrorism operation from actor-target-effect chain in an 
asymmetric context [5], The model illustrates how cyber 
terrorism in different phases could plan and accomplish a cyber 
operation as well as the effects and consequences of the digital 
attack. Figure 1 provides an illustration of how cyber terrorism 
is conducted. 
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Figure 1. Actor-target-effect Chain (adapted from Heickero [5]) 



The framework provided by Heickero is more relevant in 
understanding the modus operandi of cyber terrorism, which 
provides an attribute-chain from one attribute to another. The 
framework consists of the actors which are antagonists; the 
driving forces behind motives are social, psychological, 
economical and political; usage of means such as weapons and 
economy (resources); targets are objects such as infrastructure, 
organizations and individual; activities in realizing their goals 
such as planning and disorganization; and effects or 
consequences such as physical effect and syntax effect. 

Gordon and Ford [42] viewed cyber terrorism from the 
following perspectives; people (or groups), locations (of 
perpetrators, facilitators, victims), methods/modes of action, 
tools, targets, affiliations and motivations (Table II). They 
made an analysis on the attributes of traditional terrorism and 
integrated computer into the matrix. They concluded that the 
scope of terrorism changes within each other due to the 
addition of the computer. However, attributes such as 
perpetrator and place require further investigation as what 
important is not the perpetrator or the place, but the action [43]. 
Perhaps further analysis based on case studies is required. 



Table II. Matrix of Terrorism with Inclusion of the Computer (adapted from 
Gordon and Ford [42]) 



Attributes 


Description 


Perpetrator 


Group/ 


In the cyber context, virtual 




Individual 


interactions can lead to 
anonymity and desensitization. 


Place 


Worldwide 


The event does not have to occur 
in a particular location. The 
Internet has introduced 
globalization of the 
environment. 


Action 


Threats/ 


Terrorist scenarios typically are 




Violence/ 


violent or involve threats of 




Recruitment/ 


violence. Violence in the virtual 




Education/ 


environment includes 




Strategies 


psychological effects, possible 
behavior modification and 







physical trauma. 


Tool 


Kidnapping/ 

Harassment/ 

Propaganda/ 

Education 


Terrorists use the computer as a 
tool. Facilitating identity theft, 
computer viruses, hacking are 
examples that fall under this 
category. 


Target 


Government 

Officials/Cor 

porations 


Potential targets are corporations 
and government computer 
systems. 


Affiliation 


Actual/ 
Claimed 


Affiliation refers to recruitment 
in carrying out given 
instructions. Affiliation can 
result in the strengthening of 
individual organizations as they 
can immediately acquire access 
to the information resources of 
their allies. 


Motivation 


Social/Politic 
al Change 


Political, social and economic 
are the motivations present in 
real-world terrorism. 



III. Analysis of Findings 

Should website defacement be considered cyber terrorism? 
Would the use of the Internet by the terrorists such as fund 
raising, recruitment and propaganda be considered cyber 
terrorism? If somebody commits a certain act that meets the 
criteria of cyber terrorism, under what law will he/she be 
charged? Such examples highlight the need for a precise 
definition of cyber terrorism in order to avoid possible 
ambiguity and misinterpretation. This also will serve as a guide 
for distinguishing various terms of cyber incidents. 

Interestingly, most governments in the world do not agree 
on one single definition of cyber terrorism [11] [44]. The term 
cyber terrorism generates different meaning in the minds of 
different people. However, understanding a common 
understanding as to what phenomenon contributes to this term 
is important in order for us to get a better understanding on the 
root causes of cyber terrorism. Unfortunately, we are in 
situation where there is still no consensus agreement on a 
definition on the concept of the phenomenon. 

There is no common definition of cyber terrorism that is 
widely accepted, hence there is a lack of common ground on 
which policy makers and researchers can agree on what they 
are fighting against. In general, previous studies have defined 
cyber terrorism from various points of view. However, the 
connectivity between each component highlighted in defining 
this terminology is still unclear. Therefore, there is a strong 
need to have a specific concept of cyber terrorism, especially 
for a legal definition. The concept would provide a foundation 
to the legal fraternity such as prosecutors and judges. 

In this study, the analysis is divided into four processes: 
Plan, data collection, data analysis, and reporting, which are 
similar with other traditional stages of research [45]. While 
most of the research methodologies are described in Section III, 
the reporting is presented in Section IV. 
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A. Plan 



The planning stage started with the identification and 
investigation of research problems surrounding the identified 
phenomena. There are many terms of cyber terrorism, and 
some of them only address a subset of cyber terrorism and not 
the whole context. Due to the complexity of various interacting 
attributes or elements in cyber terrorism, to formulate a 
framework as to describe its influential considerations would 
be beneficial. Therefore, there is a need for a more structured 
approach in understanding the various attributes of cyber 
terrorism. This is crucial to the researchers and policy makers 
in understanding the context of cyber terrorism. 

B. Data Collection 

The analysis was conducted by reviewing existing literature 
on terrorism and cyber terrorism. Our goal was to examine 
whether particular researchers had developed useful insight 
into this subject and to learn whether consensus agreement had 
already been reached on this subject. Based on our 
observations, we have found that there is limited literature 
focusing on the cyber terrorism framework. However, most of 
the literature reviewed is valuable in terms of framing the 
context rather than directly providing a solution to the issues of 
this study. The materials reviewed include overseas 
government reports, articles found in websites, published 
conference materials and referred publications. 

One example of the qualitative research approach is 
grounded theory. Grounded theory was first presented by 
Glaser and Strauss in their 1967 book "The Discovery of 
Grounded Theory", which Goulding [46] describes the book 
was premised on a strong intellectual justification for using 
qualitative research to develop theoretical analysis. The phrase 
grounded theory refers to theory or general concepts that are 
developed from a corpus of data [47], [48] and the theory 
emerges through a close and careful analysis of the data [49]. 
As mentioned by Borgatti [47], the basic idea of the grounded 
theory approach is to read (and re-read) a textual database 
(such as a corpus of field note) and discover or label variables 
(called categories, concept and properties) and their 
interrelationship. 

In grounded theory development, the literature review 
provides theoretical construct, categories and their properties 
that can be used to organize the data and discover new 
connections between theory and real- world phenomena [50]. 
Developing grounded theory should formulate them into a 
logical, systematic and explanatory scheme [51], [49]. The 
theory should be based exclusively on data collected whereby 
the researchers bring a considerable background in professional 
and disciplinary knowledge to an inquiry. Researchers 
approach the question with background and some knowledge 
with the literature in the domain [49]. Levy [51] explains that 
these positions recognize that a prior understanding of the 
literature can be therefore be used effectively in developing 
theory in a number of ways. Based on the review of pertinent 
literature, prior knowledge and experience of the researcher is 
useful to formulate of a preliminary conceptual model. 

" .. experience and knowledge are what sensitize the 
researcher to significant problems and issues in the 



(IJCSIS) International Journal of Computer Science and Information Security, 

Vol. 10, No. 2, 2012 
data and allows him or her to see alternative 
explanations and to recognize properties and 
dimensions of emerging concepts" [52]. 



Haig argues that the grounded theory research begins by 
focusing on an area of study and gathers data from a variety of 
sources, including literatures [53]. It is important to note 
comment made by Levy [51], where the author explains that 
these positions recognize that a prior understanding of the 
literature can therefore be used effectively in developing theory 
in a number of ways. Based on the review of pertinent 
literature, the prior knowledge and experience of the researcher 
are useful to formulate a preliminary conceptual model. 

Heath and Cowley reveal that a pre -understanding by early 
reference to the literature can contribute to the researcher's 
understanding of social processes observed [54]. They argue 
that prior reading may be required if the researcher wishes to 
clarify concepts and build an emergent theory. Heath and 
Cowley [54] cite the work by Jezewski [55] who carried out a 
literature-based concept before attempting to further develop 
the concept via grounded theory. Heath and Cowley [54] 
further cite the comment by Glaser and Strauss [56] that "the 
researcher will not enter the field from ideas, but differ 
considerably in the role they see for the literature". Thus, 
specific understanding from experience and literature may be 
used to stimulate theoretical sensitivity and generate the 
hypotheses. This notion is supported by Onion [57] who 
concludes that the application of the grounded theory method 
to review literature and derive a meta-theory is novel, whereby 
literature may be used as the primary data by the grounded 
theory method. This is ascertained by Esteves et al. [58] 
whereby they conclude that an analysis of issues related with 
the use of the grounded theory method is very useful for people 
starting a research project. 

C. Data Analysis 

The data analysis was conducted in two steps. In the first 
step, data analysis proceeded through axial coding (examining 
conditions, strategies and consequences). This method has been 
well described by Egan [45] and Borgatti [47]. In the second 
step, the data was mapped into a matrix format [58], where 
attributes as well as similarities or patterns between them 
emerged. 

As described by Borgatti [47], axial coding is the process of 
relating codes (categories and properties) to each other, via a 
combination of inductive and deductive thinking. Borgatti [47] 
explains that grounded theorists emphasize causal 
relationships, and fit things into a basic frame of generic 
relationships. The author simplifies the process of axial coding 
framework as per Table III. This framework consists of 
systematized cause-and-effect schema which the researchers 
used to explicate relationships between categories (or 
attributes) and sub-categories. 

Egan [45] explains that a general understanding of the 
phenomenon under investigation is considered sufficient for the 
initiation of this type of research. Egan [45] further explains, 
"Having established a problem or topic in general terms and 
chosen a site where the research questions could be examined 
more closely, evidence is allowed to accumulate by the 
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researcher, resulting in an emerging theory". To develop this 
theory, "early activities by the researcher involve the 
identification of categories capturing uniformities in the data 
and then identifying compelling properties and dimensions of 
the data". This argument is further stressed by Glaser and 
Strauss [56] where they say, "A discovered, grounded theory, 
then, will tend to combine mostly concepts and hypothesis that 
have emerged from the data with some existing ones that are 
clearly useful". 

Levy [51] explains that sampling should be directed by the 
logic and the types of coding procedures used in analyzing and 
interpreting data. The result is the revelation of meaningful 
differences and similarities among and between categories. The 
possibility for a hypothesis about the relationships between 
categories is always present. By using the framework provided 
by Borgatti [47], the relationships of categories are analyzed 
and observed. 



Table III. Axial Coding Framework (adapted from Borgatti [47]) 
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impact or consequence is high as the cyber attacks are done to 
intimidate or coerce a government or people that lead to 
violence against persons or properties. The framework 
describing the components of cyber terrorism is proposed in 
Figure 2. 



Elements 


Description 


Phenomenon 


This is what in schema theory might be 
called the name of the schema or frame. It 
is the concept that holds the bits together. 
In grounded theory it is sometimes the 
outcome of interest, or it can be the subject. 


Causal conditions 


These are the events or variables that lead 
to the occurrence or development of the 
phenomenon. It is a set of causes and their 
properties. 


Action strategies 


The purposeful, goal-oriented activities that 
agents perform in response to the 
phenomenon and intervening conditions. 


Consequences 


These are the consequences of the action 
strategies, intended and unintended. 



IV. The Proposed Framework 

A conceptual framework links various concepts and serves 
as a motion for the formulation of theory [59]. A complete 
analysis of the data has revealed six emergent perspectives of 
cyber terrorism, which became the major findings of the study. 
In our view, the nature of cyber terrorism framework should 
have these six perspectives: Target, motivation, method of 
attack, domain, action by perpetrator, and impact. 

With the growing interconnectedness of critical 
infrastructures on ICT, the selection of a target that allows the 
maximum level of disruption would significantly influence the 
terrorists. Motivation is about influencing human beings and 
the decisions they make. Motivation forces behind cyber 
terrorism are social, political and belief. Cyber terrorists can 
exploit vulnerabilities over a targeted system through a vast 
array of intrusive tools and techniques. The method of attack 
could be through network warfare and psychological warfare. 
Cyberspace is the domain in which a terrorist-type attack is 
conducted. Cyber terrorists employ unlawful use of force or 
unlawful attacks to conduct the premeditated attack. The 



The framework provides a baseline when establishing and 
defining cyber terrorism. The aim is to show a more dynamic 
way in defining cyber terrorism as well as describing its 
influential considerations. Thus, it can be seen that formulating 
the framework from various strategic considerations would be 
beneficial in understanding cyber terrorism in its full context. 
Summarily, these factors will determine whether someone is 
involved in cyber terrorism or not. 
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The framework is dynamic in many aspects since the 
influential factors on the decision are based on all attributes (or 
components) within the framework. In other words, the 
framework suggests that all attributes (or components) 
contribute in the decision-making process in order to determine 
whether someone gets involved in cyber terrorism or not. The 
authors suggest that the framework presented here is an 
improvement over existing frameworks as it captures the 
important factors when considering that the perpetrator may 
combine these factors for conducting cyber terrorism. The 
components of cyber terrorism in this framework are bind 
together to form the concept of cyber terrorism. We need to 
combine the components with conjunction "AND", which 
means that each of those components is necessary to constitute 
cyber terrorism. Otherwise, if one or more components are not 
provided, it would not constitute cyber terrorism. 

A. Target 

The act of cyber terrorism is unique as it combines a 
specific target with a wider audience [60], which is illustrated 
in Figure 3 . With this argument, the CNII computer system and 
civilian population contribute to the uniqueness of cyber 
terrorism [61]. The possibility of disabling the entire CNII 
communication networks and attacking civilian community at 
large would seem to provide a variety of attractive targets. At 
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the same time, targets that are high-profile would probably be 
among the most influential factors in a terrorist group's 
decision as the damage and destruction would be 
extraordinarily significant and costly to society and the country 
attacked. 




Figure 3. Target Model (adapted from Ackerman et al. [60]) 



The assumption that attacks against computer systems are 
less dangerous, such as leading to economic losses rather than 
human lives is not true. Due to the advancement of 
technology, many essential computing services are using the 
Supervisory Control and Data Acquisition (SCADA) systems, 
and nowadays, they are connected to the Internet and can be 
controlled remotely. An attack to the SCADA system that 
controls and manages critical infrastructures may have been 
unthinkable in the past, but with current technological 
developments, it is now possible for the SCADA system to 
become a target for terrorist attacks. Brunst [62] discusses that 
there are three scenarios that could be taken into consideration; 
attacks on hydroelectric dams, tampering with railways and air 
traffic control systems, and taking over control of power plants. 
Brunst in his literature review provides excellent examples of 
terrorist attacks in these control systems, which would generate 
fear within a population. Successful cyber attacks on these 
control systems certainly have long-term effects, create fear 
and pose immediate danger to human lives. 

Apart from focusing on the ICT infrastructure, cyber 
terrorism also targets civilian population [5] [25] [60]. Attacks 
against critical infrastructure that spread fear and harm to 
innocent people within a community would be classified as 
cyber terrorism [20]. From an effect perspective, consequences 
on civilian population are bigger, thus it would get more media 
attention and be more widely publicized. The selection of a 
target that allows the maximum level of disruption would 
significantly influence the terrorists. 

B. Motivation 

Motivation is about influencing human beings and the 
decisions they make [1]. The motivating forces behind cyber 
terrorism are social, political and belief [63]. Through these 
forces, terrorists are psychologically motivated to drive 
terrorism. From the motivation perspective, cyber terrorism 
exists if the person or group of people operates with a specific 
political or ideological agenda to support their activities [20]. 
For example, the Irish Republican Army engages in terrorist 
activity for a predetermined political purpose with the objective 
to maintain and strengthen political control [6]. 

Cyber terrorism is defined as unlawful attacks and threats 
of attack against computers, networks and the information 
stored therein when done to intimidate or coerce a government 
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or its people in furtherance of political or social objectives [1 1]. 
Digital technologies thus offer contemporary terrorists and 
terrorist organizations a wide range of opportunities to support 
their campaigns of violence and if they are proficient, 
significantly support their political objectives [25]. Terrorists 
wish to undermine confidence in the political structure and 
create difficulty within the body of politics. Cyber terrorists 
cause harm or damage to people or groups of people with a 
political agenda [32]. 



C. Method of Attack 

Heickero [5] concludes that cyber terrorism comprises 
different types of methods such as computer network 
operations and psychological operations. The capability to 
conduct a cyber attack can be divided into three groups: Simple 
(unstructured), advanced (structured) and complex 
(coordinated) [64]. Heickero's [5] description of a computer 
network operation and O'Hara's [64] model of technical 
capabilities of a cyber attack fit well with the definition of 
network warfare. Veerasamy [65] defines network warfare as a 
modern form of conflict in which computers and networks are 
used as the weapons with information serving as the leverage 
control. Modern forms of network warfare include all the 
computer and network security means through which 
computers are attacked and exploited (worms, denial-of- 
service, bots) as well as all the protective mechanism being 
implemented (intrusion detection tools, anti-virus software and 
firewalls). 



Taliharm [31] suggests that the term cyber terrorism should 
also involve several other activities carried out by the terrorist 
via the Internet, including propaganda via terrorist websites. 
Spreading of propaganda via Web 2.0 media is part of 
psychological operation [43]. Web 2.0 media enables terrorists 
or terrorist groups to establish their presence in cyberspace and 
to spread propaganda, especially for the press and public 
attention [62]. Coverage of mainstream media is important as 
news coverage in the media is always repeated, thus increasing 
the propaganda message's reach. 

From a psychological perspective, a disgruntled employee 
within an organization also poses threats to the organization. 
One incident took place in Australia where a man had access to 
the sewerage control systems, which harmed the environment 
and killed wildlife [66]. It was reported that he had worked for 
the company and had knowledge of the tools that operated the 
sewerage control system. The driving forces for his action were 
revenge and the feeling of unfair treatment from the 
management. On the other hand, this category of individuals 
can be bought; and information can be sold to terrorist groups. 
An insider could also act as a cyber terrorist [5], The extra 
advantage is that they have the inside knowledge. An insider 
can be planted within the organization or through a 
sympathizer who is working in that organization. The objective 
is perhaps to provide sensitive information or to perform 
certain tasks such as putting malware into critical control 
systems for future attacks. In the US, it was reported that 20 
employees were arrested for possession of false identification 
used to obtain security access to facilities containing restricted 
and sensitive military technology [43]. 
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D. Domain 

Cyber terrorism is the convergence of cyberspace and 
terrorism. Cyberspace, whether accessed by computer systems 
or other devices, is the domain (medium) through which a 
cyber attack would be delivered. The National Security 
Presidential Directive 54/Homeland Security Presidential 
Directive 23 of the US Government defines cyberspace as the 
interdependent network of information technology 
infrastructures, and includes the Internet, telecommunications 
networks, computer systems, and embedded processors and 
controllers [67]. The UK Government defines cyberspace as 
an "interactive domain that is made up of digital networks that 
is used to store, modify and communicate information. It 
includes the Internet, but also the other information systems 
that support our businesses, infrastructure and services" [68]. 

Cyber terrorism thus can be seen as a relevant threat due to 
its strong relation to ICT and cyberspace. Apart from land, sea, 
air and space, cyberspace is another dimension of warfare. 
Weimann [21] writes that cyberspace is in many ways an ideal 
arena for activity of extremist of terrorist organizations. Among 
others, it offers easy and fast flow of information. By its very 
nature, cyberspace is also capable of reaching out to a wide 
audience throughout the world and disseminates information in 
a multimedia environment via the combined use of text, 
graphics, audio and video. 

E. Action by Perpetrator 

Flemming and Stohl [6] argue that, terrorism is a process 
that involves acts or threats, emotional reactions and the social 
effects of the acts or threats and the resultant action. Terrorism 
in the cyber environment involves all of the above components. 
The advancement of ICT and rapid changes in the 
technological environment influence terrorist resources and 
opportunities. The convergence of physical terrorism and new 
advancements of ICT have spawned a new term called cyber 
terrorism. 

Rollins and William [43] argue that, there are two views in 
defining cyber terrorism, which are based on impact (effect- 
based) and intention (intent-based). They clarify that, effect- 
based cyber terrorism exists when computer attacks result in 
effects that are disruptive enough to generate fear comparable 
to a traditional act of terrorism, even if done by criminals. This 
implies that, cyber terrorism should focus on the act rather than 
the perpetrator. While, intent-based cyber terrorism exists when 
"unlawful or politically-motivated computer attacks are done to 
intimidate or coerce a government or people to further a 
political objective, or to cause grave harm or severe economic 
damage". 

The cyber terrorist can have the same motives as the 
traditional terrorist, but they use computer and network media 
to attack [69]. Cyber terrorists conduct unlawful use of force or 
unlawful attack to conduct the premeditated attack to intimidate 
or coerce a government or people to further political, social or 
belief objectives, or to cause severe economic damage. The 
impact or consequence is high as the attacks are done to 
intimidate or coerce a government or people that lead to 
violence against persons or properties. 
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F. Impact 

The act of cyber terrorism is unique as it combines a 
specific target with a wider audience [6]. In this argument, the 
components of a purposeful violence against persons or 
properties, disruption or serious interference of critical services 
operation, causing fear, death or bodily injury, severe economic 
loss, and prejudice to national security and public safety 
contribute to the uniqueness of cyber terrorism. 

Cyber terrorism exists when there is an attack on a 
computer system that leads to violence against a person or 
property; and the disruption is enough to generate fear, death or 
bodily injury [11] [12]. Cyber terrorism is done to cause grave 
harm or severe economic damage or extreme financial harm [6] 
[22]. As reported by Rollins and Wilson [43], if terrorists were 
to launch a widespread cyber attack, the economy would be the 
intended target for disruption, while death and destruction 
might be considered collateral damage. Terrorist-type cyber 
attacks may target chemical, biological, radiological or nuclear 
(CBRN) computer network installations [18] [43]. A successful 
attack to these installations would cause enough severe 
economic disruption and harm to civilian population (death and 
bodily injury). 

With the growing interconnectedness and interdependencies 
of critical infrastructure sectors, the target selection of cyber 
terrorism is likely to be significantly influenced by those 
targets that allow for a maximum level of disruption [6] [20]. 
Terrorists' cyber attacks probably aim at critical infrastructure 
as their target. Successful cyber attacks in one sector will have 
cascading effects on other sectors. Due to this nature, a large- 
scale terrorist-type cyber attack could bring unpredictable and 
perhaps catastrophic impact to other sectors, and possibly long- 
lasting impact to the country's economy. 

V. Conclusion 

The term cyber terrorism generates different meanings in 
the minds of different people. Cyber terrorism is about threat 
perception that makes the concept differ from one to another. 
The concept of this term is an essentially-contested concept 
where it is interpreted differently at different levels such as 
researcher, professional and policy maker. Understanding 
similarities and differences in perception of what constitutes 
cyber terrorism can provide insight on the concept of cyber 
terrorism. 

In this work, the data collected from the extensive 
literatures was analyzed using the grounded theory approach, in 
which the framework was drawn. The analysis was conducted 
to determine how the components of the concept of cyber 
terrorism come together to form the concept. From the finding, 
the authors have concluded that the concept of cyber terrorism 
can be described from six perspectives: Target, motivation, 
method of attack, domain, action by perpetrator, and impact. 

This work provides a baseline when establishing and 
defining the concept of cyber terrorism. The perspectives are 
useful in determining whether someone is involved in cyber 
terrorism or not. In addition, the proposed framework shows an 
overall framework of cyber terrorism in a simplistic and 
dynamic manner. For future works, this framework can be 
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validated and assessed by encompassing both qualitative and 
quantitative techniques. Continued research in this area can be 
further conducted, which may lead to the development of 
strategic and technological framework to counter cyber 
terrorism. 
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Abstract — This survey is based on the introduction, motivation, 
and proposed system for ecological treatment of self-organizing 
cooperative cognitive wireless communication. We introduced the 
application of related literature in the proposed wireless com- 
munication scenario. The cooperative cognitive system resembles 
with the symbiotic ecological model of sharing mutual resources. 
This philosophy is nature inspired and system evolves with au- 
tonomous mutual interaction without any centralized controller. 
In other words, we can say it as evolutionary game, because of 
the biological game being played among the analogous biological 
species. Therefore, an overview of related stuff for ecological 
modeling for CCWC is introduced as may be implementable in 
other engineering disciplines. 



Index Terms- 
Networks. 



-Cooperative Cognitive Ecology, Self-organizing 



I. Introduction 

As the communication devices and services are increasing, 
the bandwidth demand is also increasing exponentially. The 
success of wireless communication is characterized by the 
allocated communication bandwidth and the Quality of Service 
(QoS) of the transmission medium. Bandwidth is the main 
resource for communication. By looking at the bandwidth 
allocation statistics issued by National Telecommunication & 
Information Administration (NTIA), it is clear that there is 
very small bandwidth capacity left for new communication 
technologies, and most of the bandwidth is dedicated to 
obsolete technologies, and services. This bandwidth resource 
scarcity is a major issue for research, both in industry and 
academia. By critically analyzing the bandwidth efficiency, it 
becomes evident that most of the bandwidth resource is seldom 
use. 

A. Related work 

The report by Spectrum Policy Task Force (SPTF) under 
the Federal Communications Commission (FCC) describe 
this as a major issue in Nov. 2002 for United States 
[1]. The cognitive radio (CR) communication technique 
emerged as a possible solution to this bandwidth scarcity 
and under-utilization problem. The idea of CR was first 
introduced by [2], to efficiently utilize the unused spectrum 
bands allocated to licensed users. In CR, the under-utilized 
bandwidth resource could be utilized by other communication 
devices and services, on opportunistic and need basis. With 
the maturity and implementation of this CR technology, the 
bandwidth efficiency issue could be solved at a large scale. In 



CR communication pattern, the licensed (primary) bandwidth 
could be used by unlicensed (secondary) communication 
users [3]. 

In wireless communications, the direct source to destination 
path could be degraded with fading and attenuation. This 
direct line of sight problem is solved by cooperating relays, 
and is known as cooperative communications (CC) [4]. These 
relay nodes help the source node in forwarding the data packet 
to the destination node [5]. Cognitive radio technique, which 
is used for effective utilization of under-utilized spectrum 
could also experience fading and attenuation problem. The 
combined method for both above motioned techniques 
emerged as cooperative cognitive wireless communications 
(CCWC), and could enhance the wireless bandwidth efficiency 
and solve communication problem. In CCWC, both primary 
and secondary users are willing to exchange under-utilized 
bandwidth with each other for robust communication. Hence, 
the CCWC is the mutual type of cooperation for two different 
types of secondary and primary bandwidth resources [6]. 

In recent years the wireless communication demand 
has greatly increase because of the increased number of 
communication devices and services. The main resource in 
communication is the bandwidth, like the natural resources 
which are scarce and limited. Most of the modern and 
traditional research is focusing on the efficient utilization 
of the bandwidth. Whatever the increase of communication 
devices, the overall bandwidth spectrum can't be increased. 
With cooperative cognitive communication the stated problem 
can be addressed to a large extent[3]. 

The application of cognitive radio (CR) technology is 
mainly dependent on the sensing and signal processing 
capabilities of the cognitive users. There is no problem in 
terms of the availability of the technological infrastructure. 
The need and application lie with the frequency 
allocation strategy policies implemented by frequency 
allocation authorities. Currently the FCC and European 
Telecommunication Standards Institute (ETSI) allocates 
bandwidth to specific users or applications. This divide and 
setting aside strategy by the spectrum regulatory authorities 
have to be modified to cater the need and implementation of 
cognitive radio technology. 
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The examples of licensed frequency spectrum bands 
include air traffic control, radio, television, cellular, and 
satellite communications. Some application are security 
sensitive while the others are quality sensitive. The advantages 
of these fixed bandwidth allocation strategy are the least 
interference, and the enhanced quality of service (QoS) 
among communicating partners. The spectrum regulatory 
authorities have to take care all these issue while achieving 
the bandwidth efficiently using the CR technology [3]. 

In addition to the licensed bandwidth, there are also 
unlicensed band which facilitate the development and 
experimentation of newly emerging technologies. For 
example, 2.4 GHz unlicensed band is available for cordless 
phones, WiFi and Blue-tooth technology. Either its' a matter 
of licensed or unlicensed spectrum band, the application of the 
cognitive radio technology can greatly enhance the bandwidth 
application and usage if operated with minimal interference 
concepts. What ever the defining and explanatory features of 
cognitive radio are, the main philosophy of implementation 
lies with the network environment awareness [7]. 

Cognitive communication can be classified into three cate- 
gories and described as follow [7]: 

1) In Underlay technique, the cognitive (secondary) 
transmitter have the knowledge about the channel 
strength. The cognitive users can transmit 
simultaneously with non-cognitive user as long as 
interference to primary users caused by the secondary 
users are below some acceptable limit, and the cognitive 
users' transmit power is limited by the interference 
constraint. 

2) In Overlay Cognitive technique, the cognitive nodes 
knows about the channel gain, codebooks and the 
message of the non-cognitive users. Cognitive user can 
transmit simultaneously with non-cognitive users; the 
interference to the non-cognitive user can be offset by 
using part of the cognitive user's power to relay the non- 
cognitive users message. Cognitive user can transmit 
at any power, and the interference to non-cognitive 
users can be offset by relaying the non-cognitive 
user's message. This technique can also be named as 
cooperative cognitive wireless communication (CCWC). 

3) In Interweave technique, the cognitive user knows 
the spectral holes in space, time and frequency, when 
the non-cognitive user is not using these bandwidth 
holes. The cognitive user can transmit simultaneously 
with a non-cognitive user only in the event of a false 
spectral hole detection. Cognitive user's transmit power 
is limited by the range of its spectral holes sensing. 

In CCWC, the primary transmitter nodes lend their 
bandwidth capacity to the relay (secondary) nodes, and 
secondary nodes are behaving as cooperative (cognitive) 
partner for the primary nodes. In the work by [6], a solution 



for spectrum leasing based on the idea that secondary nodes 
can earn spectrum access in exchange for cooperation with 
the primary link is proposed and investigated. By casting the 
problem in the framework of Stackelberg games, the analytical 
and numerical results are provided, which confirmed the 
considered model as a promising paradigm for cognitive radio 
networks. 

Another example of cooperative cognitive wireless 
communication (CCWC) can also be found in [8], which 
resembles to our proposed species communication model. 
Each type of primary and secondary users correspond to a 
species, geographically separated, and cooperate with each 
other based on exchange of bandwidth resource. Primary 
users offer spectrum opportunities along with price to the 
secondary users using centralized controller or dedicated 
control channel, while secondary users are sub-grouped on 
the basis of environmental parameters, mutual interference 
and primary spectrum availability. 

The main cognitive task can be summarized as follow: 

• Radio scene analysis which include the interference 
temperature estimation for the radio environment. This 
also include the detection of under-utilized spectrum 
bands called spectrum holes. 

• Identification of the channel for channel state information 
(CSI) estimation and to predict the capacity of the 
channel to be used by the transmitter. 

• The other major task for being cognitive is the control 
of transmitter power and the dynamic spectrum manage- 
ment. 

Figure 1 describe the scenario for the existence of 
secondary users (SU). The geographical locality, where 
primary transmitter's (PT) under-utilized bandwidth can be 
consumed by SUs, with least interference concept, is given by 
the central solid circular area. The cognitive bandwidth can 
be utilized in anyone of already described three scenarios: 
Underlay, Overlay, and Interweave. 

These cellular localities which constitute the combination 
of different wireless connection technologies, buildup a 
heterogeneous network as given in Figure 2. These localities 
are characterized by the type of communication technologies 
and types of users present in a particular cell. For the 
proposed and analyzed system scope in the research study, 
heterogeneous network is taken as an ecosystem. The mobile 
users types represent the organism and communicating with 
each other through biological protocols. 

Evolutionary mechanism design can also be used to 
analyze a hybrid approach consisting of agent based, game 
theoretic, and evolutionary mechanism. There have be a lot 
work on analyzing the Nash or evolutionary equilibrium for 
agents with fixed set of strategies to be played. Much of 
the related work is analyzed with double auction scenario. 
There are scenarios, where in real practical systems, the 
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SU = Secondary User 



Primary Transmitter Area 2 



Primary Transmitter Area 3 




Primary Transmitter Area 1,' 



Fig. 1: Example: Model Scenario of Cooperative Cognitive 
Wireless Communication. 




Fig. 2: Heterogeneous Ecological Network. 



agents are not willing to play from available given strategies. 
This situation might arise when none of the combination 
of the given set of strategies constitute Nash Equilibrium (NE). 

For a game with small number of agents or players, the 
payoff matrix is not large and is easy to calculate. But, for the 
scenario, where the agent space is large, the convergence takes 
a lot of time and payoff calculation is cumbersome to calculate. 
Hence, for the compact representation of payoff matrix in 
normal-form having k-strategies, the entries can be defined 
in the following way: 



D = (d 1 ,...,d k ) 



(1) 



where di is the total number of players, adopting i l h strategy. 
Every entry d. b £ D is further mapped to another vector of the 
following form, 

E = (ei,...,e k ) (2) 

where e; is the expected payoff of the i th player. So, for the n 
number of players, the definition of the total number of entries 



in the payoff matrix as follows: 

(n + k-iy. 
n!(jfc-l)! 



P 



(3) 



B. Contributions 

The contributions of this paper can be summarized as 
follows: 

• The transformation of the cooperative cognitive wireless 
system into ecological model is given as one of the major 
objective. 

• The cooperative cognitive ecological game is defined to 
be used in research. 

• The selection and population dynamics are given for the 
cooperative cognitive wireless scenario case. 

The rest of the paper is organized as follows: Section II 
describes the natural selection procedure with explanation of 
ecological G-function. Section III explains the centralized and 
decentralized species model in the macrocell environment. 
Section IV describes the ecological game and the rationale 
behind using this approach in this work. Section V describes 
many selection dynamics available in the ecological modeling 
for cooperative and cognitive scenario. Section VI and Section 
VII gives the description and foundation for two species pop- 
ulation dynamism. Different game parameters for cooperative 
cognitive ecological game are given in Section VIII. The 
relationship type for competition is given in Section IX, with 
the summary of the paper in Section X. 

II. Natural Selection 

The biological systems are engineered by the process of 
natural selection, which result into significant and interesting 
aspects of evolution. The basic idea of evolution can be 
explored through [9] and stated as follow: 

1) Likes tend to generate likes and the heritable variations 
in the traits can be observed in each different types of 
organism 

2) Organisms struggle for existence between themselves. 

3) Struggle for existence are influenced by the heritable 
traits. 

The modern approach to natural selection is explained with 
genetic interaction. The genetic interaction between the organ- 
isms results into the heritable characteristics of the new born. 
The survival of the individual having a particular phenotype 
is influenced by either population size or/and frequencies of 
genes. 



A. Ecological G-function 

The evolution of strategies through the mechanism of 
natural selection is a major field in biological games. 
The tools developed by the ecologist for understanding the 
evolutionary concept plays an important role for understanding 
major natural algorithmic mysteries. The fitness generating 
function (G-function) is the tool to analyze the natural 
selection process based on the Darwin's theory of evolution. 
This is the natural selection process which result into the 
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stable strategies called evolutionary stable strategies (ESS). 

As compared to the established evolutionary game theoretic 
concept for wireless communication, G-function can introduce 
a virtual strategy. With the help of virtual strategy, any mobile 
user can adapt the parameters (e.g., bandwidth resource 
block allocation, transceiver power, coding, and modulation 
etc.) based on the available strategies (strategy set), adopted 
strategy of the other mobile nodes (organism), total mobile 
users in the cell (ecosystem), and the total availability of 
resource (bandwidth). With this G-function, the dynamism of 
heritable strategies and the evolution of natural selection can 
be developed. 

Let X be a set of number of species with population 
densities, and U be a set of available strategies to each member 
of species set X, then H(X, U) is the fitness matrix. The 
fitness generating function G(v, U,X) can be defined if, 

G(v,U,X)\ v=Ui = Hi(U,X) (4) 

for i — 1, ...,n s , where n s is the number of strategies [10]. 

III. Wireless Communication Species Model 

The same type of mobile users (organism) constitute a 
species. For practical implementation, the species model 
could be centrally controlled through a base station controller 
or like self-organizing autonomous agents. For the first 
approach, the Figure 3 shows s species model with a 
centralized controller at base station. The species formulation 
is made like microcells within a macrocesll (ecosystem). The 
evolutionary game is being played among the member of the 
species, and inter-species game coordination is made through 
the central controller. 



has to interact with the neighboring mobile node, play game 
and evolve with natural selection procedure. 




Fig. 4: Example of a Cooperative Cognitive Evolutionary 
System Model. 



IV. Evolutionary Ecological Game 

The total number of players in the system plays an im- 
portant role in the species interaction. This field relates to 
the population dynamics of mutually competitive species. The 
frequency of strategies, play an important role for competi- 
tion modeling among the individuals of the ecosystem. The 
population dynamics is determined with the inter- and intra- 
species interaction. These kind of interactions are important for 
evolutionary equilibrium determination. Different competition 
model among the members of the species can be formu- 
lated for resource distribution and optimal consumption of 
resources. 




Fig. 3: Example of Cooperative Cognitive Network: III Species 
with Centralized Controller. 



For the latter approach like autonomous interacting agents, 
shown in Figure 4, the organisms in each specie interact with 
each other randomly. Similarly the primary users (PU) who 
want to establish cooperative association with any mobile node 



A. Evolutionary Ecological Game Rationale 

Game theory (GT) have been successfully used to describe 
the strategic interaction between the interacting nodes in a 
communication network. The equilibrium state can be de- 
termined and analyzed from the Nash equilibrium (NE). In 
evolutionary game theory (EGT), which is a specialized branch 
of GT, the equilibrium behavior of the system is described 
by the evolutionary stable strategies (ESS). For evolutionary 
ecological game, the system is considered as of species 
model in the ecosystem. The following few point describe 
the encouraging reasons for applying ecological concept for 

ccwc. 

• Pareto-Optimality Solution: In conventional game 
theoretic approach, the equilibrium observation to any 
strategic interaction between players is analyzed using 
Nash equilibrium. At Nash equilibrium point no players 
is willing to deviate from its current strategic position. 
This equilibrium point could be sub-optimal. The other 
global optimal position in the system can't be achieved, 
because no other player is moving away from its 
current strategy. This global optimal position can be 
achieved through evolutionary game theoretic approach 
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by calculating the ESS point. 

• Adoption of New Strategies: In classical non- 
cooperative game theoretic approach, players play their 
strategies at one, thus achieving the Nash Equilibrium 
point. In ESS, the players slowly adopt the new 
strategies, leading toward the system equilibrium point. 

• Learning Process: The evolutionary game theoretic ap- 
proach have a significant advantage over the conventional 
approach because of the learning phenomenon. The play- 
ers can observe the strategies of other players and update 
their strategies. 

V. Selection Dynamics based on Ecosystem 

The basic work on evolutionary game theory (EGT) can 
be cited back to the work by Smith and Price given in [11]. 
The decision made by the rational mobile users (players 
in our system) in EGT is determined by channel state 
information (CSI), the embedded decision making algorithm 
(e.g. evolutionary Q-learning), and the limiting availability of 
computational time. The EGT approach start with the random 
allocation of power and resource block allocation along 
with the association with the macro or femtocell. The target 
for applying EGT is to attain evolutionary stable strategy 
(ESS) equilibrium, which resist any environmental strategy 
parameter (mutant strategy) for changing the optimal and 
robust equilibrium position. 

For the analysis purpose in the formulation of EGT, pop- 
ulation or selection dynamics equations are used [12]. The 
three important parameter for the selection of any dynamical 
approach are time, utility function, and the mobile user popu- 
lation. The decision has to be made regarding the selection 
among the discrete or continuous parameters for time and 
utility, and finite or infinite mobile user population size in the 
area of observance. Another parameter for dynamic approach 
selection are based on stochastic or deterministic process. In 
other words, with selection of populations the study on how 
the users will adapt to new strategies with changing population 
and time is formulated. The concept of mixed strategy can be 
defined with the strategy distribution profile [13]. 

A. Deterministic Dynamics 

The deterministic dynamic selection are applicable to the 
scenario with normal time period duration of application. 
Some of the dynamics applicable to our system are give below. 

1) Replicator Dynamics (RD): Replicator dynamics (RD) 
are the most commonly used dynamics for EGT analysis. 
In RD, the frequency strategy profile change with the 
consideration of changing payoff parameter vector [14]. 
During RD analysis, the utility gain of any mobile user is 
determined through the strategy action of all other players in 
the current geographical context. 

The earlier work related to replicator dynamics can be 
found by R Taylor in [15]. As an example of system model 



for macrocel underlaid by femtocells, the population can 
be divided either belonging to MBS (0) or FBS i, Vi e 
{l,...,Nf} thus forming x = {0, 1,2, .., Nf} with corre- 
sponding mobile user population frequencies given by the set 
Y = {yo,yi,y2,-;VNf} and (YstVi = 1). By considering 
replicator dynamics, it is assumed that each user can associate 
randomly between maceocell and femtocells or from one 
femtocell to another femtocell. The payoff matrix (U) is 
defined in such a way, that when a user of type i E y interact 
with a user of type j e y, it gets the reward given by Uij G U. 
From above notations, the replicator dynamics equation for 
mobile user switching frequency can be given as [12]. 



ih = Vi((Uy)i - y T Uy) 



(5) 



where (Uy)i is the expected payoff of a mobile user suffering 
interference and have tendency to switch between the cells, 
and y T Uy is the expected utility gain. The strategy profile, 
which increases the payoff returns in the system have tendency 
to increase. As the mobile user have more options to switch 
between cells, the utility function take the complex form [16]. 
The generic RD form can be given as: 



m =yi(fi(y)-f(y)) 



(6) 



where fi(y) is a fitness utility function and f(y) = ^. yifi(y) 
is the average fitness. 

2) Best response dynamics (BRD): The best response dy- 
namics (BRD) is used to model the near or short-sighted 
behavior of rational mobile nodes. The underlying concept 
of BRD is that in a population of large number of mobile 
users, a small number of mobile users can change their strategy 
according to the mean strategy of the population. So, the rate 
of strategy change can be given as: 



y = P{y) - y 



(7) 



here f3(y) is the best reply e to the strategy set y such a way 
that aF Ay < e T Ay for any a, e, z e S n . One of the big 
issue with BRD is that, the solution may not be unique. So, 
in short, the BRD enables the players to predict the strategy, 
which otherwise is difficult to achieve, without considering the 
mean population trend. 

3) Smoothed best replies (SBR): One of the drawback of 
BRD is the possibility of multiple strategy profile equilibrium. 
SBR dynamics is the solution to BRD to quantify it into having 
unique solution. Another parameter e > is introduced in such 
a way that BRD becomes a special case of SBR, and can be 
given as: 

e a 7e 



ll; 



V ■ e a j( y }/ e 



minj/j 



(8) 



for taking e — > as a special case, BRD is obtained. 

VI. Two species Evolutionary Game based on 
Population Dynamics 

For two species evolutionary population dynamics [9] for- 
mulated the separate strategy frequency dynamics for species 
I and J, and is given as follow: 
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(9) 



where, q is the population size, q is the rate of change in 
population, i = 1, ..., k is the strategy type index, F and F are 
the growth and mean growth rates. 

The growth rates can be defined as: 



F 



h H eiU 



J V] 



+ TT IJ e l U IJ q J ]/N I 



benefit from the interaction, and no user harms the other 
user. During the negative competition which resembles with 
the inherent cheating behavior present in the cognitive users 
behaving selfishly. The result of such kind of competition 
can harm one type of users (specie I) while benefiting the 
other type of users (specie II). The Mutualistic or Symbiotic 
approach is the required scenario for cooperation where both 
primary and secondary users' capacities are enhanced. The 
competition set type for population dynamism (PD) is given 
in Figure 6. 



Ff 



..ii 



e.U^q 1 } 



+ n JJ e 3 U JJ q J ]/N J (10) 



where, it is the encounter rate between the species, and TV 
is the total number of each user type. 

The work by [17] signifies the usage of the species fre- 
quency or density and shows the results along with the ex- 
planation of already established models and specialized cases 
for a general two-species model which can be seen in Eq. 
9 and Eq. 10. Moreover, the general model leads to natural 
questions of evolutionary game theory. They have shown that 
classical Gause or Lotka-Volterra species interactions emerge 
when strategy selection pressures are solely inter- or intra- 
species respectively. These systems that combine population 
dynamics with strategy evolution are more stable than one 
would expect looking at either effect separately. 

VII. Population Dynamism 

In population dynamism, different types of species may 
have complex interaction among the members of the species. 
The members of each type of species are involved in inter- 
as well as intra-species competition, as mentioned above, and 
given in Figure 5. 




Fig. 6: Population Dynamism for Interacting Mobile Users. 



VIII. Cooperative Cognitive Ecological Game 
Parameters 

The cooperative cognitive wireless communication (CCWC) 
relationship for ecological model can best be exemplified by 
mutual coalition of algae and fungi, which is called lichen, or 
the relationship of sea anemones and hermit crabs. These kind 
of natural coalition is import for the existence of both type 
of species. Some of the important parameters for cooperative 
cognitive ecological game are given in Table I and explained 
as follows: 



The population dynamism in CCWC can be classified 
among the following three classes. 

• Positive Competition 

• Negative Competition 

• Mutualism or Symbiosis 
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Fig. 5: Interaction effect of inter-species collaboration. 

Positive competition resembles with the coalition, when 
relaying capacity is available for primary users (PUs) and 
cognitive bandwidth is consumed by the secondary users 
(SUs). During the competition, either type of users can get 



Dependency: The resource demand of one type of species 
from the other type of species can be classified as follows: 

• Obligate Mutualism (OM) - High Mutualistic Depen- 
dency 

• Facultative Mutualism (FM) - Low Mutualistic Depen- 
dency 

• Asymmetric Mutualism (AM) - No Dependency 

In the obligate mutualism (OM) scenario, primary and 
secondary users survival depend on the degree of cooperation. 
The more the cooperation, the more both type of users get 
benefited. The lack of one type of species from cooperation 
will block the cooperation for communication. In the 
facultative mutualism (FM) case, the cooperation could not 
be required by one type of species, but for the optimal social 
welfare function enhancement, both cooperate. While for 
the AM scenario, the cooperating species are indifferent for 
cooperation. 

N interactions: The mutualistic interaction could depend 
on the number of interactions among the cooperative species. 
This degree of interaction depends on the behavior of 
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mutualistic reward, and can vary from 1 to N. 

Offer produced: The interaction among the members 
of the species can start randomly or based on offers. The 
random interaction may take some time for evolutionary 
convergence. The other type of interaction, where the offers 
are being made by participating species. The more beneficial 
offer leads to more cooperation possibility in the future. 
These offers are available even before the cooperation 
start. After each cooperative interaction, the newly modified 
offers are considered by each species member for prolonged 
cooperation. Cheating is a major cause of deviation from the 
evolutionary stability and leads to breaking the mutualistic 
contract. 

Moves: The decisions made by the mobile users are called 
moves. In a strategic competitive interaction, the decision 
made by the users are simultaneous or alternating. These 
moves are further considered as being repetitive or sequential. 
In the repetitive move, there are many rounds of interaction 
depending on the willingness of the users according to the 
received benefit. The specialized type of game is the one-shot 
game, where decision comes in a single interaction. 

Mobility: Mobility is one of the major characteristic of 
the wireless systems. Migration of users from one cellular 
community (ecosystem) to another is very tricky to implement 
in the analytical models. Most of the scenarios consider the 
sessile situations for the mobile systems. The differential 
equations based models are a useful way for consideration 
of the mobility. Other specialized situation for femtocell 
networks (FN) are the close-access and open-access scenario. 
For the close-access scenario, the sessile population model is 
taken for analysis. For open-access scenario, the mobile users 
can move from one ecosystem to another. 

Active choice: Active choice is concerned with the partner 
selection mechanism. For mutualistic scenario in CCWC, 
the active choice is important for selection of pairs, having 
more affinity toward each other. The more the mutual 
benefit, they get from each other, the more the tendency for 
mutualistic cooperation. For open-access FN, active choice is 
also considered for selection of base station from different 
available femtocells or macrocell. 

Partner recognition: Partner recognition plays it's role 
for mutualistic cooperation, when the pairing of cooperative 
mobile users are made through some centralized controller. 
The centralized controller implement some algorithm for 
robust and reliable CCWC system. 



TABLE I: Critical Parameters for Cooperative Cognitive 
Game. 



Parameter 


Possible Combinations (Primary, Secondary) 


Active Choice 


Yes, No 


Behavioral Options 


Cooperate, Defect 


Control over Interaction 


Full, Limited 


Dependency 


High, Low 


Investment 


Yes-No, Variable 


Mobility 


Mobile, Sessile 


Moves 


Sequential, Simultaneous, Alternating 


N Interaction 


One-off, Repeated 


Offer Produced 


Prior, During, After 


Partner Recognition 


Yes, No 


Payoff Symmetry 


Symmetrical, asymmetrical 



TABLE II: Analogy: CCWC vs. Ecosystem. 



Communication Network 


Ecology 


Communication 


Ecology 


Communication System 


Ecosystem 


Mobile Devices 


Organism 


Communication Network 


Species 


Heterogeneous Network 


Ecosystem 


Coexisting Networks (CN) 


Biodiversity 


Each subnetwork in HN 


Biome 


World Wide Web 


Biosphere 


Wireless environment favoring communication 


Niche 


Environmental existence of any network 


Habitat 


Communication system engineering 


Niche construction 
(Eco-sys engineering) 


Handover 


Migration 


Communication Network 


Food Web 


Secondary Users 


Consumers 


Primary Users 


Producers 


Relays 


Producers 


Centralized Controllers 


Keystone species 


Resources 


Biomass 


Cooperative Cognitive Wireless Communication 


Mutualism/Symbiosis 


Interference 


Parasitism 


Spatial Distribution 


Biogeography 


Mobile density 


Population density 


Mobile user growth 


Population growth 


BAN energy generation from body 


Metabolism 



Investment: By investment, it means that the game players 
contribute toward social welface function by playing altruistic 
strategy. Investment is concerned with the future rewards, 
which may be envisioned after the cooperation process is over. 

Payoff symmetry: This parameter is required to incorporate 
the cheating behavior of rational mobile users. 

Control over interaction: The game players (mobile users) 
have full control over the interaction process, and can finish 
it at anytime without harming the partner. 



Behavioral options: In cooperative scenario, the interacting 
mobile nodes have the option to cheat or cooperate. Cheating 
can halt the mutualistic behaviors. Mechanism theory is the 
technique to design the cooperative games in such a way, that 
cheating is excluded from the rationality of players decision 
choice. 



IX. Competition and Relationship Types 

The competition among the mobile users could be any of 
the following form. 

• Resource-resource competition 

• Resource-service competition 

• Service-service competition 



165 



http://sites.google.com/site/ijcsis/ 
ISSN 1947-5500 



INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND INFORMATION SECURITY, VOL. X, NO. X, MONTH YEAR 



Cognitive wireless networks possess flexible architectures 
based on dynamically reconfigurable cognitive radios, where 
the concept of cognition is used to reach various end-to-end 
objectives, including frequency agility and adaptivity. Most 
of the current focus in cognitive networking systems is to 
optimize local network coordination and control. This is 
concentrated at the device level, and there is no specific 
direction to characterize the scale at which cognitive network 
interactions can co-evolve, grow and flourish, to understand 
their role and impact on a global scale towards future 
communication technologies. 

Biological Communication Diversity (BioCommDiv) mea- 
sure the existence of heterogeneous types of network within 
a particular geographical area or locality. This is the system, 
where interoperability of devices and services matters a lot, 
for seamless communication among different types of devices 
with different type of protocols and operating system (OS). 
The more BioCommDiv factor shows the health of communi- 
cation system (CS). The Figure 7 shows different scenarios for 
cooperative communication, cognitive radio, inter-cell coordi- 
nation, and evolution toward ecological interacting cooperative 
cognitive network. 




(a) Cooperative Communication 




(b) Cognitive Communicaiton 




(c) Inter-cell Networks 




(d) Ecological Coop-Cog Network 

Fig. 7: Interacting Mobile Networks. 



X. Summary 

This paper establishes the basic model for cooperative cog- 
nitive ecological system. This can be seen that there is almost 
complete analogy between the communication system and 
ecological system. The evolutionary algorithms and protocol 
are robust enough to reshape the communication techniques 
and strategies. This paper forms the fundamental base for 
ecological communication system. The future work based on 
cooperative cognitive ecological system is very promising in 
many research disciplines. 
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Abstract: 

Wireless local-area networks based on IEEE 802.11 a/b/g standards are growing rapidly. 
WLANs can provide the benefits of network connectivity without the restrictions of 
being tied to a location or restricted by wires. Despite the convenience of mobility, the 
performance of a WLAN must be addressed carefully before it can be adopted and 
deployed in any environment. In our research, we addressed the impact of various key 
parameters on the actual performance of IEEE 802. llg. We performed series of 
experiments to assess the performance of IEEE802.11g, in the presence of interferences, 
and finding the maximum through-put under realistic conditions. In addition the impact 
of co-channel, adjacent channel interferences and noises on the quality of WLAN speed 
was also exposed. Overall, we conducted independent set of experiments to measure the 
IEEE 802.11g's effective application- level throughput. The analysis results and 
measurement provided insights into the required provisioning for 802. llg WLAN to 
ensure that it will provide the needed coverage and capacity for the intended users. 

I- Introduction: 

Wireless technologies enable freedom of mobility for users by releasing the constraint of 
physical connections - network connections become cable-free. Wireless technologies 
use radio frequency (RF) as the medium of transmission, and allow organizations to 
eliminate cables for simpler network management at effective costs. The IEEE 802.11 
standard establishes several requirements for the RF transmission characteristics of an 
802.11 radio. Included in these are the channelization scheme as well as the spectrum 
radiation of the signal (that is, how the RF energy spreads across the channel 
frequencies). In IEEE 802. llg, channels 1, 6 and 11 are considered to be non-overlapping 
and hence the premise that these channels can be used such that multiple networks can 
operate in close proximity without interfering with each other [6]. Interference has 
always been considered as an unavoidable peril in wireless networks. Based on channel 
of origin, interference can be categorized into co-channel (from transmissions on the 
same channel as the receiver) and adjacent-channel (transmissions on adjacent and 
overlapping channels). The IEEE 802.11 b/g standards operate in the unlicensed ISM 2.4 
Ghz spectrum which has 11 out of 14 channels available for use in the US [4, 5]. These 
overlapping channels degrade network performance. Few researches have been 
conducted in this area. In this paper we present a full scale performance study and 
analysis on IEEE 802. llg, to measure its effective application-level throughput under 
different scenarios. In order to improve its performance, a clear understanding of WLAN 
behavior is needed, therefore measuring and analyzing the performance of system under 
realistic conditions is of paramount importance. In my experiments, we studied the effect 
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of interferences on TCP traffic. 

The rest of paper is organized as follows. In section II, we discuss the related work. 
Experimental setup is described in Section III. Performance evaluation methodology is 
explained in section IV. The results obtained are described in section V; we present our 
conclusion and discuss future work in section VI. 

II- Related Work: 

Stine et al. (2003) [7] stated that Interference has always been considered as an 
unavoidable peril in wireless networks. To reverse the effect they constructed simple 
analytical and empirical models of such interference occurring in IEEE 802.11 networks, 
and illustrated two scenarios where such interference can be exploited. Banerjee (2006) 
[1] stated that in 802.11 and other wireless networks, adjacent channel interference is 
considered a peril. In order to avoid this peril, two simultaneously communicating nodes 
that are in close proximity are assigned to different non-overlapping channels, i.e., 
channels 1, 6, and 11 in 802.11b are non-overlapping. Boulmalf et al. (2006) [2] 
addressed the impact of various key parameters on the actual performance of IEEE 
802. llg in the presence of interferences, and found the maximum through-put under 
realistic conditions. The analysis results and measurement campaign provided insights 
into the required provisioning for 802. llg WLAN to ensure that it provided the needed 
coverage and capacity for the intended users. Liese et al. (2006) [3] studied the relative 
performances of single and multiple channels in both single hop and multi hop wireless 
mesh networks. In one of their experiments, they studied the effect of antenna placement 
on the access point and determine the impact on performance. Sharma et al. (2006) [6] 
characterized the performance of multi-channel IEEE 802. llg wireless networks. They 
conducted the experiments on a sample topology consisting of just two flows on non- 
overlapping channels and found that the expected increase in throughput was seen only 
when the separation between the antennas of the radio devices was above a threshold 
value. 

III- Experimental Setup: 

One testing location in University of Agriculture Faisalabad was selected for research, 
Testing environment consisted of two 802. llg APs, four 802. llg wireless cards, IBM 
ThinkPad same configuration laptops running windows XP and two Dell Latitude dame 
configurations laptops as shown in Figure 1.1. Laptops were placed at a distance of 5 
meters from APs. Experiments were conducted by keeping channel of one AP constant 
while changing the channels of the other AP in order to check the interference caused by 
channel overlapping. Iperf was used to generate TCP traffic and measure the throughput. 
SPSS was used to perform analysis of the results that we got from my experiments. 
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CH 6 fix 



DWL-G650 
CH 6 fix 




5 Meters 



5 Meters 



A 





DWL700AP DWL2100AP 

cm to 11 cm to 11 

Figure 1.1 Experimental Setup 




A. Hardware Requirements 



Sr. No. Brand Computer Operating System Processor RAM 


1 


IBM T22 ThinkPad 


Microsoft Windows XP 


900 MHz 


256MB 


2 


IBM T22 ThinkPad 


Microsoft Windows XP 


900 MHz 


256MB 


3 


Dell Latitude D 600 


Microsoft Windows XP 






4 


Dell Latitude D 600 


Microsoft Windows XP 







Sr. No. 


Access Points 


WLAN Standard 


1 


D Link DWL-700 


IEEE 802.11 g 


2 


D Link DWL-900 


IEEE 802.11 g 




Sr. No. 


Wireless Network Cards 


WLAN Standard ■ 


1 


D-Link DWL G650 


IEEE 802.11 g 



B. Software Requirements 



Sr. No. Tools 


1 


DU Meter 


2 


Iperf 


3 


SPSS 



IV- Performance Evaluation Methodology: 

In our experiments we characterized the results on the basis Throughput of TCP 
generated through Iperf. We measured the effect on throughput by changing the 
frequency channel schemes: 



AP / Wireless Cards 


Frequency Channel 


D-Link DWL 700AP 


1 


D-Link DWL 2 100AP 


1 


D-Link DWL G650 


6 


D-Link DWL G650 


6 


D-Link DWL G650 


6 


D-Link DWL G650 


6 



Table 1.1 Frequency Channel Schemes 

The frequency channel scheme showed in table 1.1 was changed as: Keeping 700 AP 
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channel 1 and changing 900 AP from 1 to 11, than changing 700 AP channel to 2 and 
again changing 900 AP from 1 to 11, utilizing 700 AP all channels one time and 900 AP 
channels 11 times thus making 121 possible combinations. 

V- Experimental Results: 

The experiments were done 10 times in university environment, and the mean results of 
10 sheets were calculated. The detail results in the form of graphs are given below: 



Figure 2.1 shows that when the channel of 
both 2100AP and 700AP is 1 TCP throughput 
decreases while with all other channels its 
value increase because same frequency causes 
interference and decrease the throughput due 
to overlapping. 
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Figure 2.1: TCP Throughput Graph 
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Figure 2.2: TCP Throughput Graph 
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Figure 2.2 shows when the channel of both 2100AP and 
700AP is 2 TCP throughput decreases while with all 
other channels its value increased due to the fact that 
same channels overlap each other and cause interference. 
Highest possible throughput value is achieved when the 
channel of AP2 is in the range of 6-11. 



Figure 2.3 shows when the channel of both 
2100AP and 700 AP is 3 TCP throughput 
decreases while with all other channels its 
value increased due to the fact that same 
channels overlap each other and cause 
interference. Highest possible throughput 
value is achieved when the channel of AP2 is 
in the range of 6-11 



Figure 2.3: TCP Throughput Graph 
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Figure 2.4 shows that when the channel of 
both 2100AP and 700AP is 4 TCP 
throughput decreases while with all other 
channels its value increase because same 
frequency causes interference and decrease 
the throughput due to overlapping. The 
combination of 4-3 and 5-3 gives smaller 
throughput values as compared to other 
combinations. 



Figure 2.4: TCP Throughput Graph 



TCP Throughput Chart 
Frequency Channel of AP-A=5 



« 7 
m 6 
S 5 
£.4 



Frequency Channel of AP-B 



Figure 2.5: TCP Throughput Graph 
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Figure 2.5 shows when the channel of both 
2100AP and 700AP is 2 TCP throughput 
decreases while with all other channels its 
value increased due to the fact that same 
channels overlap each other and cause 
interference. Highest possible throughput 
values are achieved with the combination of 2- 
5, 10-5. 



Figure 2.6 shows that when the channel of both 
2100AP and 700 AP is 6 TCP throughput 
decreases while with all other channels its 
value increase because same frequency causes 
interference and decrease the throughput due 
to overlapping 
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Figure 2.6: TCP Throughput Graph 
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Figure 2.7: TCP Throughput Graph 



Figure 2.7 shows when the channel of both 
2100AP and 700 AP is 7 TCP throughput 
decreases while with all other channels its 
value increased due to the fact that same 
channels overlap each other and cause 
interference. Lowest possible throughput 
value is achieved when the channel of AP2 is 
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Figure 2.8 shows when the channel of both 
2100AP and 700 AP is 8 TCP throughput 
decreases while with all other channels its 
value increased due to the fact that same 
channels overlap each other and cause 
interference. Lowest possible throughput 
value is achieved when the channel of AP2 is 
in the range of 5-11. 
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Figure 2.8: TCP Throughput Graph 
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Figure 2.9: TCP Throughput Graph 
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Figure 2.9 shows that when the channel of 
both 2100AP and 700AP is 9 TCP 
throughput decreases while with all other 
channels its value increase because same 
frequency causes interference and decrease 
the throughput due to overlapping. Highest 
throughput values are obtained from 1-5 



Figure 2.10 shows that when the channel of 
both 2100AP and 700 AP is 10 TCP throughput 
decreases while with all other channels its 
value increase because same frequency causes 
interference and decrease the throughput due 
to overlapping. Throughput is decreased as the 
level of frequency channels moves from 1 to 
11 accept at 10 which gives lowest value due 
to interference. 



Figure 2.10: TCP Throughput Graph 
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Figure 2.11: TCP Throughput Graph 



Figure 2.11 shows that when the channel of both 
2100AP and 700 AP is 11 TCP throughput 
decreases while with all other channels its value 
increase because same frequency causes 
interference and decrease the throughput due to 
overlapping. Throughput is decreased as the level 
of frequency channels moves from 1 to 11. 
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VI. Conclusion and Future Work 

Overall results show that overlapping channels cause performance degradation. When 
API has channels from 1 to 5 adjusting the channels of AP2 from 1 to 5 cause low 
throughput value and high throughput value was achieved from channel number 6 to 11. 
While adjusting API between 6 to 11 and changing AP2 channels from 1 to 5 cause 
higher throughput than that from 6 to 1 1 which cause lower throughput in this case. 
In future researchers can conduct experiments by using 802.11a standard. They can make 
comparison among 802.11a, 802.11b and 802. llg standards. They can use more 
combinations of frequency channels as 802.11a have more non-overlapping frequency 
channels as compared to 802.11b and 802. llg. 
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Abstract — This paper analyzes the cooperative cognitive wire- 
less communication (CCWC) model based on biological mutu- 
alism (BM). In BM, different types of species help each other 
for mutual existence in the ecosystem. We tried to formulate our 
CCWC problem similarly like BM scenario, and examined how 
the bandwidth resource would be shared with each other for 
mutual existence, and the number of primary and secondary 
users which can coexist based on the available bandwidth 
resource. 

Index Terms — Cooperative Cognitive Wireless Communica- 
tions; Ecological Systems. 



Primary Users 
(Specie IJ 




Secondary Users 
(Specie II) 


Mutualism 





Fig. 1: Mutualistic Model for Cooperative Cognitive Radio. 



I. Introduction 

Cooperative cognitive wireless communication (CCWC) 
consist of two types of resources: licensed bandwidth owned 
by primary users and relay bandwidth owned by secondary 
users. Primary Users can be modeled as one type of species 
and the secondary types of users can be modeled as another 
type of species. In CCWC both types of user species are 
willing to lend bandwidth to each other in exchange of their 
own bandwidth resource. Talking in ecological concepts, both 
primary and secondary type of users are in the state of 
mutualism: where cooperation is in mutual benefit for each 
other [1]. 

A. Related work 

For the systems which are behaving strategically and 
competing for the allocation of resources, game theory can 
best describe the allocation of resources and equilibrium 
position [2]. Within game theory, evolutionary game theory 
can best describe the dynamism of any system [3]. The 
wireless system is characterized by the moving and changing 
position of the mobile terminals. Therefore, for the type 
of system like wireless, evolutionary game theory can best 
answer many questions. 

For the proposed model, the dynamical branch of game 
theory for the analysis of cooperative cognitive systems is 
being used. Dynamical framework can best be described by 
Lotka Volterra equations which work almost same as replicator 
equations and is the foundation of ecological modeling. For co- 
operative cognitive systems, primary and secondary users get 
benefit from the presence of each other. The more the number 
of resources for sharing, the more they get benefits. Figure 1 
draws the mutualistic model for primary and secondary users. 



B. Contributions 

The contributions of this paper can be summarized as 
follows: 

• The basic structure of cooperative cognitive ecological 
system is described. 

• The effects of population dynamism with varying number 
of primary and secondary users are discussed. 

• The ecological game with analysis from Lotka- Volterra 
equations are given with cooperative cognitive mobile 
user population. 

The rest of the paper is organized as follows: Section II 
explains the system model. Section III analyzes the growth in 
density dependent mobile users. Section IV explains the evo- 
lutionary game scenario for cooperative cognitive users with 
formulation of the game. Section V explains the Lotka- Volterra 
competition with explained results. The matrix game and local 
stability is discussed in Section VI, with the summary of the 
paper given in Section ??. 

II. System Model 

The proposed system model is given in Figure 2. The two 
cooperative types of users are primary and secondary, each 
labeled as species I and species II. The primary nodes (species 
I) can lease their under-utilized bandwidth to the secondary 
users and secondary users can behave as cooperating relays 
for primary users, thus forming a mutualistic system in 
ecology. 

One major estimation for the cooperative cognitive com- 
munication system is the population dynamisms for each 
type of species, after transferring our system in mathematical 
ecological form. The mutualistic behavior in natural ecosystem 
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Secondary Users 



PT; — i c Primary Transmitter 
PR, - i lh Primary Receiver 
ST, - i m Secondary Transmitter 
SRi - i lh Secondary Receiver 




Fig. 2: The System Model. 



is very complex because of the many types of species influ- 
encing on each others resource directly and indirectly. The 
system is made simple, but still it is very complex because 
of the spatial position of the cell, terrain condition, cellular 
population distribution, hardware complexity, and many types 
of interferences and noise. One possible type of the ecosystem 
is where species exploit the common resources, and situation 
like tragedy of the commons occur. Here, species are rivals and 
tries to exploit each others resources. The proposed system 
is based on mutualistic model, where cooperation mean the 
survival of interacting species. The benefit or payoff is directly 
proportional to the existence and population of any type of 
species. 

III. Density Dependent Communication Growth 

For the simulation of density dependent communication 
growth rate, linear negative feedback from mobile users is 
assumed. It is started with some specific user population 
iV(0), cellular carrying capacity K, and per capita bandwidth 
growth rate r. 

By assuming the population dependency, it is meant to say 
that number of users effect the per capita bandwidth growth. 
So, if K is cellular user's environmental carrying bandwidth 
capacity in terms of individual user, then N — K is the amount 
of unsed carrying bandwidth capacity, while (TV — K)/K 
shows the remaining fractional carrying bandwidth capacity. 
Therefore, 



dN 
~dl 



= rN 



K-N 
K 



(1) 



If user population is almost zero, then the carrying 
bandwidth capacity is almost under-utilized, and dN/Ndt is 
nearly equal to per capita bandwidth growth rate. If the total 
number of mobile users are equal to the carrying capacity, 
then the cellular environment is totally occupied or used, and 
the corresponding rate of change of number of users with 



respect to time variable is zero. 

In Figure 3 it is tried to show how the continuous logistic 
mobile population growth rate change. The simulation is 
started with 5 mobile users, carrying capacity of 500 mobile 
users, and per capita intrinsic growth rate greater than zero 
and showed how the system will gradually accommodate new 
mobile users while reaching the carrying capacity of 500 
mobile users. 



Continuous Logistic Population Growth 




Time ( t ) 

Fig. 3: The Continuous Logistic Mobile Population Growth. 



A. Using Evolutionary Game Theory 

In economical theory, the strategic behavior is characterized 
by the conscious behavior present between the players in 
which the competitors (players) are aware of mutually 
conflicting interest and interdependence of the decisions. The 
exact resemblance of this strategic behavior can be seen in 
cooperative cognitive communication in which the primary 
and secondary (relay) nodes are the strategic players, with 
conflicting interest of bandwidth allocation (resource) and 
decisions (allocated bandwidth to each other). The branch 
of economics dealing with such kind of strategic partner, 
is called game theory and have been very successful in 
term of answering the strategic interaction of cellular nodes 
in cooperative communication. The same kind of strategic 
interaction is present in cognitive radio between primary and 
secondary users, and the calculation is being made for the 
stability point in term of Nash equilibrium (no player is better 
off in changing the strategy). 

The most of the work in game theory for cooperative 
cognitive communication have be dealing with the 
formulation and existence of Nash equilibrium. The statistical 
interpretation of the population of primary and secondary 
nodes have been neglected for most of the work. This 
behavior was first analyzed by [3] in which he studied the 
behavior of population for pre-programmed players meeting 
strategically and discussed the evolutionary stable strategy 
(ESS) criterion. 
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IV. Cooperative Cognitive Evolutionary Game 
(CCEG) 

In cooperative cognitive wireless communication (CCWC) 
game, the competition for the resource is between the 
primary and secondary user, in which each type of users have 
its own objective function, which need to be maximized. 
Each primary user from the pool of primary users want 
to maximize it's utility by using the appropriate number 
of relays from secondary users, thus forming cooperative 
communications model. For primary users the competition is 
among other primary users as well as in selection of relays 
form the secondary users, on the other hand, the secondary 
user want to maximize their utility function by getting more 
share opportunistically by getting dedicated bandwidth from 
secondary users. 

Every game is characterized by the players, strategies 
and payoffs. Similarly the players, strategies and payoffs for 
CCEG can be described as follow: 



V. LOTKA-VOLTERRA COMPETITION 

Density dependent growth models like the logistic equation 
simulate an intra-specific competitive process; resources 
become limiting as the population increases, and the per 
capita growth rate declines. In this scenario, an additional 
term is added to the logistic to represent inter-specific density- 
dependent effects, and a pair of the resulting expressions 
comprise the Lotka-Volterra competition equations, which 
provide a simple and historically important vehicle for 
thinking about competitive interactions. In the Lotka Volterra 
equations, densities of both species are subtracted from the 
carrying capacity to give a density dependent feedback term, 
and the number of inter-specific competitors is weighted by a 
term called the competition coefficient which varies with the 
species' similarity in resource requirements. The basic Lotka 
Volterra equations for CCWC system can be given as, 



• Players: In CCEG, the two types of distinct players are 
the primary users and secondary users, which may also 
called an individual species. The individual member in 
each species is a distinct player. So, in our proposed 
system, there are two (2) types of species: M number 
of players in specie-I (primary users), and N number of 
players in specie-II (secondary users). This formulation 
lead to a total number of Mx N individual players 
interacting strategically with each other. 

• Strategy: Each player in each type of species (primary, 
secondary) are willing to sell or lease its bandwidth to 
the members of other type of species. In our model, 
primary (licensed) bandwidth can be used by secondary 
nodes, and relay (unlicensed) bandwidth can be used 
by the primary nodes, using cooperative technique. In 
addition to the bandwidth being offered to other types 
of species members, nodes also have to offer price 
to charges in exchange of bandwidth. So, the set of 
strategic pairs for primary and secondary users would be 
{{w Pi ,p Pi )} and {(ws^PSj)}, respectively. 

• Payoff: In evolutionary game theory, payoff is also re- 
ferred as the fitness. This fitness function is determined 
by each player's net utility. 

Even for simplest model of species, the population 
dynamics have complex behavior, which may include 
bifurcation and chaos. When cooperative cognitive system 
is transformed into mathematical ecological form, the major 
task is the investigation of population densities dynamism for 
both primary and secondary users. For natural ecosystem, 
many types of species interact with each other, forming a 
complex dynamic system. Our proposed system consist of 
two species, still making it quite complex, involving many 
different types of noise introduction, hardware complexity, 
spatial position and distribution etc. It is also tried to analyze 
the dependence of density on dynamism. 



x 
KZ 



K x 



y = yr y ( 1 + — + p 



K„ 



(2) 



(3) 



Where, x and y represent the primary and secondary user 
population, K x and K y as the cellular carrying capacity 
for primary and secondary users, r x and r y as primary and 
secondary growth rate, and a is the competition coefficient, 
describing the competitive effect of secondary users on 
primary users, and similarly the j3 is the competitive 
coefficient for secondary users, describing the competitive 
effect of primary users on secondary users [4]. 

The Figure 4 - Figure 8 show some of the examples for 
Lotka Voltera Mutualism. 



Primary-Secondary Lotka-Volteera Mutualism 
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4: The Primary-Secondary Lotka-Volterra Mutualism I. 



The parameters used for Figure 4-8 are: x(0) = 10, 
r x = 0.9, K x = 500, a = 0.6, y(0) = 10, r v == 0.9, 
0.6. 



' ;/ 



K y = 500, /3 
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Primary-Secondary Lotka-Volteera Mutualism 



Lotka-Volterra Competition: Phase Plane 
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5: The Primary-Secondary Lotka-Volterra Mutualism II. 
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Fig. 6: The Primary-Secondary Lotka-Volterra Mutualism III. 



Density-dependent communication growth is similar to 
intra-species competition behavior. With the increasing 
population, the decline in the resource availability and 
per-capita communication growth, can be observed. The 
already described density-dependent growth rate equations 
can be modified to cater the inter-species interaction. These 
type of equations are labeled as Lotka-Volterra equations [4]. 

The non-zero population at equilibrium can be given as 
follows: 



aK y - K x 

1 -a/3 : 

PK X - K y 
1-aB ' 



(4) 



(5) 



Where superscript * represent the equilibrium user population. 

The parameters used for Figure 9 and Figure 10 are: 
x(0) = 30, r x = 1.5, K x = 560, a = 0.7, y(0) = 20, r y = 3, 
K y = 600, /3 = 0.8. 

Figure 9 shows the population dynamics for primary and 
secondary users. The line corresponding to primary user is 
the net zero isocline for primary users, while the straight line 
corresponding to the secondary user is the net zero isocline 
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Fig. 7: The Primary-Secondary Lotka-Volterra Mutualism IV. 
Lotka-Volterra Competition: Time Trajectory 
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Fig. 8: The Primary-Secondary Lotka-Volterra Mutualism V. 



for secondary users. These are the lines for equations where 
the rate of change of population is zero (% = = -Jr). 
The intersection point shows the equilibrium position. The 
curve shows how the equilibrium position is achieved after 
starting our simulation from initial population of primary and 
secondary users. 




Secondary Users 



Time (t) 
Fig. 9: The Primary-Secondary Lotka-Volterra Mutualism VI. 

Similarly, Figure 10 shows the variation of both types of 
mobile users with the passage of time, with respect to the inter- 
dependence of mutual bandwidth resource. The simulation is 
started with almost the same number of each type of users. 
The population of secondary users, grow rapidly as compared 
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Secondary Users 




Fig. 10: The Primary-Secondary Lotka-Volterra Mutualism 
VII. 



to primary users. As the time passes, the system converge to 
stable situation and primary and secondary users are stable at 
some fixed population. 

VI. Matrix Game and Local Stability 

For formulation of cooperative cognitive matrix game, the 
individual payoff of individuals of each type of specie is 
represented by a matrix. For our formulation of game with 
two species, the two matrices, each representing the payoffs 
of primary and secondary users are formulated. If the total 
number of primary users are represented by M and total 
number of secondary users are N, then the order of the two 
payoff matrices would be M*N. The matrix in CCMG is not 
symmetric, because each type of specie have different types 
of benefits and prices to pay for pairwise coalition. 

Many types of coalition games can be formulated as 
a matrix game [5], And, most of the evolutionary game 
theoretic approach are formulated as matrix games [3]. So, 
similarly there is feasibility for formulating cooperative 
cognitive game as matrix game. 



VII. Summary 

Cooperative cognitive mutualism is the interaction of 
primary and secondary users. More specifically, in this type 
of coalition primary users give benefits to the secondary 
users in-term of allowing the usage of fixed bandwidth, 
while secondary users behave as relays to the primary users, 
in return of advantages to each other. The similar type of 
definition of mutalism can be found in [6]. This type of 
exchange is beneficial to both primary and secondary users. 
The same type of mutualistic behavior can be observed in 
[7] where pollens or nectars are spread with the services of 
bees. This idea can also be supported by economic theory of 
comparative advantages, where abundant goods are exchanged 
among different type of species in trade [8]. This mutualistic 
behavior can be treated as positive-positive interaction. 
This interaction type has been described as Lotka-Volterra 
equations. 

Mutualism is defined as the relationship between two or- 
ganisms that benefits both. Cooperative cognitive mutualisms 
(CCM) carry costs and benefits to both primary and secondary 
nodes. CCM is favorable when the mutual benefits are greater 
than the individual costs, so it is the net benefits (or benefit 
cost ratio) that determine the outcome of these interactions. If 
the individual communication without the mutual cooperation 
of primary and secondary node is not feasible, CCM is called 
obligatory cooperative cognitive mutualism (OCCM). And, 
if the primary and secondary communication can survive in 
the absence of the each other, CCM is called facultative 
cooperative cognitive mutualism (FCCM). The CCM relation- 
ship may not be symmetric. For example either primary or 
secondary node may be obligated to the mutualism, while 
the other can live without its mutualistic partner. The CCM 
can be symbiotic, meaning that the existence of primary and 
secondary communication is always found together. Based on 
the Lotka-Volterra mutualistic system model, the future work 
can explains the bandwidth resource competition for fixed and 
variable amount of bandwidth from ecological point of view. 



The equilibrium can be analyzed based on the availability 
of both types of resources and the number of each type of 
users. If there is a local stability point for the co-existence 
of both types of users, then the equilibrium is attained after 
some iterative duration. In the calculation, the analysis of 
the equilibrium conditions can be obtained. The possible 
variables used for work are the varying amount of bandwidth 
resource and the changing number of users of each type. 

For the local stability of the equilibrium situation, the 
Jacobian matrix of primary and secondary types of users for 
primary and secondary type of bandwidth can be formulated. 
As an example, for the two types of changing variables 
(Mobile users, Bandwidth resource) for two types of users 
(primary, secondary) competing for two types of resources 
(primary, secondary) result into a Jacobian matrix of degree 
4x4. The locally stability criteria ensures that the eigenvalues 
of the Jacobian matrix can have negative numerical real values. 
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